Enrich/good folder contains empty 'run=[date]_$folder$' files


I have just setup the snowplow enrich process using R97 Knossos.

The process completes successfully, but the enrich/good folder contains no folders, and just empty files with the name pattern


In enrich/bad I can see matching folders, with two files (_SUCCESS and part-....-.txt).

In archive/enrich I can see a matching folder with the same ifle pattern as above, but also an empty file with the run name and _$folder$ appended.

I’ve noticed there has been a similar issue which should now be fixed here: https://github.com/snowplow/snowplow/issues/3139


Hello @hanskohls,

These _$folder$ files are harmless. We had plans to remove them, but pushed back this ticket.

enrich/good should not contain data after pipeline finished, folders got archived into archive/enrich by S3DistCp step that leaves these ghost _$folder$ files.

If data is present in archive/enrich then I don’t see reasons for you to worry.


For more information, check out the documentation from AWS:



is there a way to use the aws cli to delete all the _folder files?


What I have done before is:

export BASE_RM_PATH=example-bucket/example-path; for f in $(aws s3 ls --recursive s3://$BASE_RM_PATH/ | grep '_\$folder\$' | perl -nae 'print "$F[3]\n";'); do echo "aws s3 rm s3://$BASE_RM_PATH/$f"; done

once happy with the result, simply remove the “echo” and quotes to execute


thanks @knservis . i tried using the include with remove but didnt quite get it working…


Did you run it as is (replacing BASE_RM_PATH=example-bucket/example-path for the correct path)? If yes, did you get the output you expected ( a whole series of aws s3 rm statements with the expected files to be deleted)? @bhavin


ah… no i ment i tried running the s3 ls rm --recursive --include “.path.” to filter and remove only one file from all the folders :slight_smile:


@bhavin Please let us know if you tried my suggestion. If it worked or if it didn’t or if you decided to do something else or nothing at all - let us know as this will help others that are reading this thread.


hey @knservis… I had to put in a slight modification.
from perl -nae 'print "$F[3]\n"; to perl -nae 'print ( (split("/" , @F[3] ))[-1] , "\n") ;' since we are using the same $BASE_RM_PATH we only need the file name run=date... if we dont do that the script will repeat the prefix twice.

# modified version

export BASE_RM_PATH=<s3bucket>/<prefix>;

for f in $(aws s3 ls --recursive s3://$BASE_RM_PATH/ | grep '_\$folder\$' | perl -nae 'print ( (split("/" , @F[3] ))[-1] , "\n") ;');
echo "aws s3 rm s3://$BASE_RM_PATH/$f";

i ended up using the one liner

echo "enter s3path:"; \
read s3path; \
aws s3 ls --recursive s3://$s3path/ \
| awk -F '/' '/_\$folder\$/  { print $3 }' \
| xargs -I {} echo aws s3 rm s3://$s3path/{}

but what I really wanted to do is use the --recursive and --include & --exclude flag for rm and let aws cli do the work for me, which will be faster and I wouldn’t have to worry about intermediate errors and clean up or tracking, etc.
( finally i got it this time … )

read s3path; \
aws s3 rm --dryrun s3://$s3path/ \
--recursive \
--exclude '*' \
--include "*_\$folder$" \

let me know what you think!. and thanks for the pointer above…


That exclude include trick in the last example seems to work well and it will be faster than listing and then doing an rm for each. That’s very helpful @bhavin thanks.