Rerunning logs (new to Snowplow)

datawise · December 18, 2019, 4:24pm

Hi - I’m very new to Snowplow. Two weeks ago I didn’t know it existed, but yesterday I successfully completed an AWS set up. There was a lot of trial and error while I worked through the set up, including several EMR runs failing. (The documentation is fantastic, btw)

I now need to reprocess all of the elastic beanstalk collector logs. I took over the final set up / configuration of Snowplow last week. The tracker and collector had been set up since last June. We have 6+ months of logs that I want to get into our Redshift warehouse. My problem is that I don’t know how to force Snowplow to reprocess all of the EBS logs from last June.

I have tried to make this happen by deleting all of the files in all of the config.yml S3 directories, other than Raw:In, but despite this EmrEtlRunner continues to process the files from the last successful run. I’ve confirmed that the raw logs from June are still present in the Raw:In S3 elastic beanstalk log bucket. I’m not sure what I’m missing, and haven’t been able to find an answer to this question online.

Can some please point me in the right direction?

Thanks,

Tony

ihor · December 18, 2019, 11:45pm

@datawise, I assume you have set up Clojure collector. Not sure what logs you refer to. The following post might clarify something for you, How does EmrEtlRunner determine what the latest logs are in the raw "in" bucket?.

datawise · December 19, 2019, 4:48pm

Thanks ihor - That’s exactly what I was looking for, and I should have mentioned that we are using a Clojure connector. Much appreciated.

Topic		Replies	Views
How does EmrEtlRunner determine what the latest logs are in the raw "in" bucket? Enrichment	2	1685	September 8, 2016
Processing logs for a specific time period AWS batch pipeline (Legacy)	5	1414	November 14, 2016
ETL runner overwriting processing logs Enrichment	4	1380	May 17, 2017
No Snowplow logs to process since last run For engineers	1	888	June 27, 2018
Empty s3 shredded logs after successful EmrEtlRunner job AWS batch pipeline (Legacy)	5	1865	August 9, 2018

Rerunning logs (new to Snowplow)

Related Topics