EMR failing : Enriched HDFS -> S3: FAILED


#1

Hi,

We are facing an issue while running EMR. We are getting the following error. Can you please explain what could be the problem :

**D, [2016-11-23T09:31:43.298000 #11815] DEBUG -- : Initializing EMR jobflow**
**D, [2016-11-23T09:31:48.286000 #11815] DEBUG -- : EMR jobflow j-1OEYR9HSGVLGH started, waiting for jobflow to complete...**
**F, [2016-11-23T09:43:50.677000 #11815] FATAL -- : **

**Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-1OEYR9HSGVLGH failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.**
**Snowplow Dev ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2016-11-23 09:41:29 +0000 - ]**
** - 1. Elasticity Scalding Step: Enrich Raw Events: COMPLETED ~ 00:01:52 [2016-11-23 09:41:33 +0000 - 2016-11-23 09:43:26 +0000]**
** - 2. Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:14 [2016-11-23 09:43:28 +0000 - 2016-11-23 09:43:42 +0000]**
** - 3. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]**
** - 4. Elasticity Scalding Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]**
** - 5. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]):**
**    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:475:in `run'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'**
**    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in `run'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'**
**    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'**
**    uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `<main>'**
**    org/jruby/RubyKernel.java:973:in `load'**
**    uri:classloader:/META-INF/main.rb:1:in `<main>'**
**    org/jruby/RubyKernel.java:955:in `require'**
**    uri:classloader:/META-INF/main.rb:1:in `(root)'**
**    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'**

#2

Hi,

can some one please explain what is the problem in above mentioned issue. We are unable to run EMR due to this problem.


#3

We are getting this error at step where EMR tries to copy files from enriched events to good bucket :

INFO startExec ‘hadoop jar /usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar --src hdfs:///local/snowplow/enriched-events/ --dest s3://snowplow-etl-emr-runner/development/enriched/good/run=2016-11-28-05-40-24/ --srcPattern .part-. --s3Endpoint s3.amazonaws.com’

Error is : Input path does not exist: hdfs://ip-10-0-0-42.ec2.internal:8020/tmp/81630e53-eb06-47b2-a23c-7652cf14acb4/files


#4

Hi @rahul,

A good and popular tool to help you with these kind of issues is the Dataflow diagram on our GitHub wiki.

I am assuming that this is still the same error as from the first post, please correct me if that is not the case.

The error seems to indicate something went wrong with the EMR job. The first thing I would try is, as per the diagram, to empty the enriched:good files and rerun EmrEtlRunner with the --skip staging option.

Step failures sometimes happen and restarting from the correct step and in the correct way (please see the recovery steps below the image) is sometimes required.


#5

Hi @rahul,

I am facing the same case. How did you solve it?

Thanks