I’m using snowplow r77 and have started having failures on the Shred Enriched Event step (both staging and production environment) in the last 5 days.
Here is the output I get :
Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-2K7LCNCVIN965 failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com
/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL Staging: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2016-04-11 08:09:40 +0000 - ]
- Elasticity S3DistCp Step: Raw S3 -> HDFS: COMPLETED ~ 00:01:08 [2016-04-11 08:09:40 +0000 - 2016-04-11 08:10:48 +0000]
- Elasticity Scalding Step: Enrich Raw Events: COMPLETED ~ 00:02:20 [2016-04-11 08:10:55 +0000 - 2016-04-11 08:13:15 +0000]
- Elasticity S3DistCp Step: Enriched HDFS -> S3: COMPLETED ~ 00:00:40 [2016-04-11 08:13:15 +0000 - 2016-04-11 08:13:55 +0000]
- Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: COMPLETED ~ 00:00:40 [2016-04-11 08:13:55 +0000 - 2016-04-11 08:14:36 +0000]
- Elasticity Scalding Step: Shred Enriched Events: FAILED ~ 00:00:06 [2016-04-11 08:14:36 +0000 - 2016-04-11 08:14:42 +0000]
- Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
The step fails after 5-6 seconds and there are no logs available at all in EMR which makes it hard to debug.
I’m sure that there are some events to shred (in other words, I don’t get this error : https://github.com/snowplow/snowplow/wiki/Troubleshooting#shred-fail).
I was using spot instances, and disabled it (following the recommendation here : https://groups.google.com/forum/#!topic/snowplow-user/rFw6E4Ysafs) but still have the same problem.
Any idea of what could be wrong ?
Or what I should look for.