EMR jobflow failing on Hadoop Enrich step after a few seconds


#1

Hi Snowplowers,

I have a weird problem with EMR ETL runner. I scheduled it to run every day at 4AM, and it does, at least most of the times.

When it doesn’t run it quits very early in the enrichment step and it doesn’t leave any logs, although I have level: DEBUG in my config file.

Any ideas?

This is my crontab:

Snowplow Analytics

00 4 * * * /snowplow/snowplow-runner-and-loader.sh


#2

Hi Joao,

We have had a similar issue, although mostly with the Hadoop Shred job - we raised it with AWS and heard back:

I’ve been looking at the clusters that you sent with another engineer
here and we believe that this is an instance-controller issue. There are
certain rare situations if there’s a long process name, it can break
instance-controller which looks like what’s happening here. A couple of
your command includes a very long argument and we believe that’s what’s
causing the instability. This exists in emr-4.3.0 and older.

We have been upgrading jobs to emr-4.5.0 and so far so good - incidence of this problem seem to have declined. Can you try the same on your side and report back?


"Elasticity Scalding Step: Shred Enriched Events" failures
#3

Hi Alex,

Upgrading jobs to 4.5.0 solved the problem.

Thank you!


#4

Thanks Joao. For others reading this - we have not seen this error again since upgrading jobs to 4.5.0.


#5

How do I upgrade jobs to 4.5.0?


#6

Hi @dweitzenfeld,

Change the ami_version to 4.5.0 in config.yml:

  emr:
    ami_version: 4.5.0

Regards,
Ihor