Enrich/good folder contains empty 'run=[date]_$folder$' files


#1

I have just setup the snowplow enrich process using R97 Knossos.

The process completes successfully, but the enrich/good folder contains no folders, and just empty files with the name pattern

run=[timestamp]_$folder$

In enrich/bad I can see matching folders, with two files (_SUCCESS and part-....-.txt).

In archive/enrich I can see a matching folder with the same ifle pattern as above, but also an empty file with the run name and _$folder$ appended.

I’ve noticed there has been a similar issue which should now be fixed here: https://github.com/snowplow/snowplow/issues/3139


#2

Hello @hanskohls,

These _$folder$ files are harmless. We had plans to remove them, but pushed back this ticket.

enrich/good should not contain data after pipeline finished, folders got archived into archive/enrich by S3DistCp step that leaves these ghost _$folder$ files.

If data is present in archive/enrich then I don’t see reasons for you to worry.


#3

For more information, check out the documentation from AWS:

https://aws.amazon.com/premiumsupport/knowledge-center/emr-s3-empty-files/