After updating to v87 remaining processing_$folder$


#1

Hey,

we updated from v85 -> v87 some days ago and we encounter some problems. We have a remaining “processing_folder” in our snowplow-logs directory (basically the raw events input directory). I don’t know why this folder is in there and believe it should be deleted if the run was done.

a more obscure thing is, that the cron job exits everytime with Error running EmrEtlRunner, exiting with return code 1. StorageLoader not run, but it works, if I just call it in the terminal, it’s the snowplow-runner-and-loader.sh and it’s exactly like the original one in the repo. If I then run the storage-loader alone, it breaks with couldn't find atomic-events folder in enriched-good so why does the EMR-ETL doesn’t run, what else does he check?


#2

I encountered another problem, the EMR process throws out some files from processing in the /logs directory again, first he puts everythin in there and then when the EMR process starts, he puts them out of the folder. That’s super weird


#3

Hi @tclass,

Didn’t you by any chance bump AMI? I encountered these $folder$ files when worked with AMI >4.5.0. Also I know that some S3 clients used to create same files to be able to discover S3 as a filesystem, but not sure if this is a case today.


#4

The AMI didn’t change for the last 2 releases and we are on 4.5.0 :frowning: really weird is, that he puts stuff from processing in logs/ again. Any idea why that could be the case? @anton


#5

Hi @tclass - what collector format are you using? We saw this issue with one of our pipelines recently; it was using the CloudFront Collector.

Yes, this is a bug, thanks for detecting this:

https://github.com/snowplow/snowplow/issues/3139


#6

yes, we’re also using the CloudFront Collector