I’m using EmrEtlRunner (Beanstalk/Clojure + S3 + EMR + Redshift) and I had an issue where EMR was running for 2 days (normally takes <20 mins) so I had to terminate it.
Once it was terminated, I ran “snowplow-runner-and-loader.sh” again but had to move files out of the S3 bucket (processing, shredded, enriched) because it throws an error that the folders aren’t empty which is fine.
Anyway, when I it all ran successfully again, I found I was missing a couple of days of data. How would I go about getting that back? I have all the files from processing, shredded and enriched and (I didn’t delete anything, just moved it).
Also, I run it 6 times per day - would running it for part of that day cause duplicating in the atomic.events table?