Elasticity Scalding Step: Enrich Raw Events fails


#1

My EMR job fails always at the step ‘Elasticity Scalding Step: Enrich Raw Events’. BTW I do not see any logs in my S3 under 'XXX-snwplw-etl/logs/ or under the EMR console.

Included below is the stacktrace

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-XXX failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATED_WITH_ERRORS [STEP_FAILURE] ~ 00:08:52 [2016-07-19 08:13:30 UTC - 2016-07-19 08:22:22 UTC]

    1. Elasticity Setup Hadoop Debugging: COMPLETED ~ 00:00:19 [2016-07-19 08:13:33 UTC - 2016-07-19 08:13:52 UTC]
    1. Elasticity S3DistCp Step: Raw S3 -> HDFS: COMPLETED ~ 00:01:16 [2016-07-19 08:13:54 UTC - 2016-07-19 08:15:11 UTC]
    1. Elasticity Scalding Step: Enrich Raw Events: FAILED ~ 00:05:40 [2016-07-19 08:15:23 UTC - 2016-07-19 08:21:03 UTC]
    1. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
    1. Elasticity Scalding Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
    1. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
    1. Elasticity S3DistCp Step: Enriched HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
      /home/ec2-user/snowplow/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:471:in run' /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:insend_to’
      /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in call_with' /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:incommon_method_added’
      /home/ec2-user/snowplow/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in run' /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:insend_to’
      /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in call_with' /home/ec2-user/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:incommon_method_added’
      file:/home/ec2-user/snowplow/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in (root)' org/jruby/RubyKernel.java:1091:inload’
      file:/home/ec2-user/snowplow/snowplow-emr-etl-runner!/META-INF/main.rb:1:in (root)' org/jruby/RubyKernel.java:1072:inrequire’
      file:/home/ec2-user/snowplow/snowplow-emr-etl-runner!/META-INF/main.rb:1:in (root)' /tmp/jruby6992624810045179703extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in(root)’

#2

The ami_version is 4.5.0


#3

Hi @aloksimha,

Quite often, constant failure at Elasticity Scalding Step: Enrich Raw Events step is an indication of lack of EMR resources. You have the emr section in the configuration file which dictates the type of EC2 instances to be used and their amount to create the EMR cluster for the job.

However, this is not the only reason. I believe you might have not look for the logs thoroughly enough. Please, refer to the EMR Troubleshooting page for guidance. https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce. The article has outdated screenshots but should give you an idea what to look for.

If you go to EMR service on AWS you should be able to see the cluster list with the name you gave to it in the configuration file. It will be accompanied by the job ID. Click on the one which has failed and extend the Steps section. It should list all the attempted steps and their statuses. From there you should be able to access the corresponding Hadoop logs too.

Mind you it might not be sufficient to examine only the stderr. You might need to check the other logs too (controller, syslog, stdout).

Also, you might want to check the logs for the previous step even though it has succeeded. It might give you a clue why the following step has failed.

–Ihor