Elasticity Scalding Step: Enrich Raw Events fails

aloksimha · July 19, 2016, 8:30am

My EMR job fails always at the step ‘Elasticity Scalding Step: Enrich Raw Events’. BTW I do not see any logs in my S3 under 'XXX-snwplw-etl/logs/ or under the EMR console.

Included below is the stacktrace

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-XXX failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATED_WITH_ERRORS [STEP_FAILURE] ~ 00:08:52 [2016-07-19 08:13:30 UTC - 2016-07-19 08:22:22 UTC]

1. Elasticity Setup Hadoop Debugging: COMPLETED ~ 00:00:19 [2016-07-19 08:13:33 UTC - 2016-07-19 08:13:52 UTC]
1. Elasticity S3DistCp Step: Raw S3 -> HDFS: COMPLETED ~ 00:01:16 [2016-07-19 08:13:54 UTC - 2016-07-19 08:15:11 UTC]
1. Elasticity Scalding Step: Enrich Raw Events: FAILED ~ 00:05:40 [2016-07-19 08:15:23 UTC - 2016-07-19 08:21:03 UTC]
1. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
1. Elasticity Scalding Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
1. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
1. Elasticity S3DistCp Step: Enriched HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
  /home/ec2-user/snowplow/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:471:in run' /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:insend_to’
  /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in call_with' /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:incommon_method_added’
  /home/ec2-user/snowplow/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in run' /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:insend_to’
  /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in call_with' /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:incommon_method_added’
  file:/home/ec2-user/snowplow/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in (root)' org/jruby/RubyKernel.java:1091:inload’
  file:/home/ec2-user/snowplow/snowplow-emr-etl-runner!/META-INF/main.rb:1:in (root)' org/jruby/RubyKernel.java:1072:inrequire’
  file:/home/ec2-user/snowplow/snowplow-emr-etl-runner!/META-INF/main.rb:1:in (root)' /tmp/jruby6992624810045179703extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in(root)’

aloksimha · July 19, 2016, 8:38am

The ami_version is 4.5.0

ihor · July 21, 2016, 9:59pm

Hi @aloksimha,

Quite often, constant failure at Elasticity Scalding Step: Enrich Raw Events step is an indication of lack of EMR resources. You have the emr section in the configuration file which dictates the type of EC2 instances to be used and their amount to create the EMR cluster for the job.

However, this is not the only reason. I believe you might have not look for the logs thoroughly enough. Please, refer to the EMR Troubleshooting page for guidance. https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce. The article has outdated screenshots but should give you an idea what to look for.

If you go to EMR service on AWS you should be able to see the cluster list with the name you gave to it in the configuration file. It will be accompanied by the job ID. Click on the one which has failed and extend the Steps section. It should list all the attempted steps and their statuses. From there you should be able to access the corresponding Hadoop logs too.

Mind you it might not be sufficient to examine only the stderr. You might need to check the other logs too (controller, syslog, stdout).

Also, you might want to check the logs for the previous step even though it has succeeded. It might give you a clue why the following step has failed.

–Ihor

Topic		Replies	Views
"Elasticity Scalding Step: Shred Enriched Events" failures Enrichment	4	2395	April 29, 2016
EmrEtlRunner error - Elasticity Scalding Step: Enrich Raw Events: FAILED For engineers	3	942	June 5, 2017
EMR failing : Enriched HDFS -> S3: FAILED Troubleshooting	4	1879	April 11, 2017
Emr etl runner fails without useful error on step "Elasticity Spark Step: Enrich Raw Events" Troubleshooting	3	3148	July 25, 2018
Steps Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3	13	1096	January 17, 2020

Elasticity Scalding Step: Enrich Raw Events fails

Related Topics