EMR failing : Enriched HDFS -> S3: FAILED

rahul · November 23, 2016, 10:19am

Hi,

We are facing an issue while running EMR. We are getting the following error. Can you please explain what could be the problem :

**D, [2016-11-23T09:31:43.298000 #11815] DEBUG -- : Initializing EMR jobflow**
**D, [2016-11-23T09:31:48.286000 #11815] DEBUG -- : EMR jobflow j-1OEYR9HSGVLGH started, waiting for jobflow to complete...**
**F, [2016-11-23T09:43:50.677000 #11815] FATAL -- : **

**Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-1OEYR9HSGVLGH failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.**
**Snowplow Dev ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2016-11-23 09:41:29 +0000 - ]**
** - 1. Elasticity Scalding Step: Enrich Raw Events: COMPLETED ~ 00:01:52 [2016-11-23 09:41:33 +0000 - 2016-11-23 09:43:26 +0000]**
** - 2. Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:14 [2016-11-23 09:43:28 +0000 - 2016-11-23 09:43:42 +0000]**
** - 3. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]**
** - 4. Elasticity Scalding Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]**
** - 5. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]):**
**    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:475:in `run'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'**
**    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in `run'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'**
**    uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `<main>'**
**    org/jruby/RubyKernel.java:973:in `load'**
**    uri:classloader:/META-INF/main.rb:1:in `<main>'**
**    org/jruby/RubyKernel.java:955:in `require'**
**    uri:classloader:/META-INF/main.rb:1:in `(root)'**
**    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'**

rahul · November 28, 2016, 6:12am

Hi,

can some one please explain what is the problem in above mentioned issue. We are unable to run EMR due to this problem.

rahul · November 28, 2016, 6:52am

We are getting this error at step where EMR tries to copy files from enriched events to good bucket :

INFO startExec ‘hadoop jar /usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar --src hdfs:///local/snowplow/enriched-events/ --dest s3://snowplow-etl-emr-runner/development/enriched/good/run=2016-11-28-05-40-24/ --srcPattern .part-. --s3Endpoint s3.amazonaws.com’

Error is : Input path does not exist: hdfs://ip-10-0-0-42.ec2.internal:8020/tmp/81630e53-eb06-47b2-a23c-7652cf14acb4/files

leon · November 30, 2016, 11:59am

Hi @rahul,

A good and popular tool to help you with these kind of issues is the Dataflow diagram on our GitHub wiki.

I am assuming that this is still the same error as from the first post, please correct me if that is not the case.

The error seems to indicate something went wrong with the EMR job. The first thing I would try is, as per the diagram, to empty the enriched:good files and rerun EmrEtlRunner with the --skip staging option.

Step failures sometimes happen and restarting from the correct step and in the correct way (please see the recovery steps below the image) is sometimes required.

Jeferson · April 11, 2017, 8:35pm

Hi @rahul,

I am facing the same case. How did you solve it?

Thanks

Topic		Replies	Views
Getting error in Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED AWS batch pipeline (Legacy)	5	1555	September 27, 2017
Elasticity Scalding Step: Enrich Raw Events fails Enrichment	2	1620	July 21, 2016
Snowplow::EmrEtlRunner::EmrExecutionError Enrichment	3	1079	April 25, 2019
How do I debug EMR failure: TERMINATED_WITH_ERRORS [VALIDATION_ERROR] Enrichment	3	10525	February 16, 2017
EmrEtlRunner error - Elasticity Scalding Step: Enrich Raw Events: FAILED For engineers	3	933	June 5, 2017

EMR failing : Enriched HDFS -> S3: FAILED

Related Topics