EmrEtlRunner running for days at Step "Shred Enriched Events"


#1

Hello,

I am running snowplow EMR ETL Runner 91. I am running this on top of a 10 m4.4xlarge nodes. The job will run up until the “Elasticity Spark Step: Shred Enriched Events” at which it will run for days and never finish. This doesn’t happen every run and their appears to be little to no pattern on when it will run for days. While it runs for days it creates a backlog of files to run through and makes a big hassle.

I click through on the EMR dashboard to go to the stderr logs and I just say days worth of:

18/04/08 18:05:45 INFO Client: Application report for application_1522951903812_0005 (state: RUNNING)
18/04/08 18:05:46 INFO Client: Application report for application_1522951903812_0005 (state: RUNNING)
18/04/08 18:05:47 INFO Client: Application report for application_1522951903812_0005 (state: RUNNING)
18/04/08 18:05:48 INFO Client: Application report for application_1522951903812_0005 (state: RUNNING)

Looking at the stderr logs there appears to be no difference in the logs except for a finished state being reached on a run that doesn’t go on for days.

I don’t believe that my nodes are running out of disk space: 44%20PM

If anyone has run into this issue before and successfully solved it or has any ideas about how to solve it I would greatly appreciate the knowledge!


#2

Hey @frankcash, I think you can find the cause of failure in YARN container logs. Specifically somewhere in containers/application_1522951903812_0005/.