EmrEtlRunner unable to start


#1

Hi

I am new to Snowplow and want to run emretl runner in mumbai (ap-south-1) region and facing this below error, can anyone help me.

ArgumentError (AWS EMR API Error (ValidationException): The supplied release label is invalid: emr-4.5.0.):
    uri:classloader:/gems/elasticity-6.0.8/lib/elasticity/aws_session.rb:33:in `submit'
    uri:classloader:/gems/elasticity-6.0.8/lib/elasticity/emr.rb:302:in `run_job_flow'
    uri:classloader:/gems/elasticity-6.0.8/lib/elasticity/job_flow.rb:153:in `run'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:449:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `<main>'
    org/jruby/RubyKernel.java:973:in `load'
    uri:classloader:/META-INF/main.rb:1:in `<main>'
    org/jruby/RubyKernel.java:955:in `require'
    uri:classloader:/META-INF/main.rb:1:in `(root)'
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

Thanks in advance.


#2

Hi @deepak

It looks like the EMR AMI version (release label) is 4.5.0 in the configuration file (the first error you are seeing) but 4.5.0 isn’t available in ap-south-1, however it’s available in any of the other Asia Pacific regions.

If you attempt to spin EMR up in one of these regions it should work fine, otherwise it may (?) be possible to bump the EMR version in the config to 4.6.1 but I’m not too sure if that’d work.


#3

Hi @mike thanks for the quick response. Can you please tell me how i can check which EMR AMI version is available in the regions given by amazon. Is there any document amazon has given or we have to check manually.

Thanks!


#4

I wasn’t able to find any Amazon documentation linking EMR AMI versions to regions but if you use
https://ap-south-1.console.aws.amazon.com/elasticmapreduce/home?region=ap-south-1#quick-create and switch out the region for the region in question, you’ll be able to see the AMI versions supported by looking in the ‘Release’ dropdown under ‘Software Configuration’.


#5

Thanks for the info mike.


#6

There’s a good chance it will work - we just haven’t got round to testing this ourselves. Give it a try @deepak?


#7

Hi @alex

Sure I will give it a try and update you.


#8

Hi @alex @mike

The cluster is now launching successfully in ap-south-1 region but the job is still not running. It gives me Bootstrap failure. I have not given any bootstrap step in my config file. I have not configured storage part yet so my conf is missing those entries.

Config.yml file

 # Credentials can be hardcoded or set in environment variables
  access_key_id: *********
  secret_access_key: ****************
  s3:
    region: ap-south-1
    buckets:
      assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
      jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
      log: s3://udmd-d-storage/udmd-d-etl/logs
      raw:
        in:                  # Multiple in buckets are permitted
          - s3://elasticbeanstalk-ap-south-1-872626332308/resources/environments/logs/publish/e-3g6bah32p3           # e.g. s3://my-in-bucket
        processing: s3://udmd-d-storage/udmd-d-etl
        archive: s3://udmd-d-storage/udmd-d-archive    # e.g. s3://my-archive-bucket/raw
      enriched:
        good: s3://udmd-d-storage/udmd-d-enriched/enriched/good       # e.g. s3://my-out-bucket/enriched/good
        bad: s3://udmd-d-storage/udmd-d-enriched/enriched/bad        # e.g. s3://my-out-bucket/enriched/bad
        errors: s3://udmd-d-storage/udmd-d-enriched/enriched/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
        archive: s3://udmd-d-storage/udmd-d-archive/enrich/good    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
      shredded:
        good: s3://udmd-d-storage/udmd-d-enriched/shredded/good       # e.g. s3://my-out-bucket/shredded/good
        bad: s3://udmd-d-storage/udmd-d-enriched/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
        errors: s3://udmd-d-storage/udmd-d-enriched/shredded/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
        archive: s3://udmd-d-storage/udmd-d-archive/shredded/good    # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
    ami_version: 4.6.1      # Don't change this
    region: ap-south-1        # Always set this
    jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
    service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles

    ec2_subnet_id: subnet-91c2e2db # Set this if running in VPC. Leave blank otherwise
    ec2_key_name: DemoEnricherKeyPair
   # bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
    software:
      hbase:                # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
      lingual:              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
    # Adjust your Hadoop cluster below
    jobflow:
      master_instance_type: c4.large
      core_instance_count: 2
      core_instance_type: c4.large
      task_instance_count: 0 # Increase to use spot instances
      task_instance_type: m1.medium
      task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
    bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
    additional_info:        # Optional JSON string for selecting additional features
collectors:
  format: clj-tomcat  # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
enrich:
  job_name: Snowplow ETL # Give your job a name
  versions:
    hadoop_enrich: 1.8.0 # Version of the Hadoop Enrichment process
    hadoop_shred: 0.9.0 # Version of the Hadoop Shredding process
    hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
  continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
  download:
    folder: # Postgres-only config option. Where to store the downloaded files. Leave blank for Redshift
  targets:
    - name: "My Redshift database"
      type: redshift
      host: ADD HERE # The endpoint as shown in the Redshift console
      database: ADD HERE # Name of database
      port: 5439 # Default Redshift port
      ssl_mode: disable # One of disable (default), require, verify-ca or verify-full
      table: atomic.events
      username: ADD HERE
      password: ADD HERE
      maxerror: 1 # Stop loading on first error, or increase to permit more load errors
      comprows: 200000 # Default for a 1 XL node cluster. Not used unless --include compupdate specified

    - name: "My Elasticsearch database"
      type: elasticsearch
      host: ADD HERE # The Elasticsearch endpoint
      database: ADD HERE # Name of index
      port: 9200 # Default Elasticsearch port - change to 80 if using Amazon Elasticsearch Service
      sources: # Leave blank to write the bad rows created in this run to Elasticsearch, or explicitly provide an array of bad row buckets like ["s3://my-enriched-bucket/bad/run=2015-10-06-15-25-53"]
      ssl_mode: # Not required for Elasticsearch
      table: ADD HERE # Name of type
      username: # Not required for Elasticsearch
      password: # Not required for Elasticsearch
      es_nodes_wan_only: false # Set to true if using Amazon Elasticsearch Service
      maxerror: # Not required for Elasticsearch
      comprows: # Not required for Elasticsearch
monitoring:
  tags: {} # Name-value pairs describing this job
  logging:
    level: DEBUG # You can optionally switch to INFO for production
snowplow:
  method: get
  app_id: clojureCollectorDem-env # e.g. snowplow
  collector: ec2-52-66-165-150.ap-south-1.compute.amazonaws.com  # e.g. d3rkrsqld9gmqf.cloudfront.net

Thanks in advance.


#9

@alex @mike can you please help me what bootstrap steps is the job trying to invoke. Is this can be a region issue. In EMR console, below is the bootstrap step which is failing.

Location: s3://snowplow-hosted-assets-ap-south-1/common/emr/snowplow-ami4-bootstrap-0.2.0.sh Arguments:1.5

Error receiving

./snowplow-emr-etl-runner --config ~/config.yml --resolver ~/snowplow-master/3-enrich/config/iglu_resolver.json --skip elasticsearch,staging
D, [2016-09-29T12:53:35.808000 #3362] DEBUG -- : Initializing EMR jobflow
D, [2016-09-29T12:53:40.264000 #3362] DEBUG -- : EMR jobflow j-WO53WX7OSZEI started, waiting for jobflow to complete...
W, [2016-09-29T12:55:41.747000 #3362]  WARN -- : Job failed. 2 tries left...
W, [2016-09-29T12:55:41.748000 #3362]  WARN -- : Bootstrap failure detected, retrying in 54 seconds...
D, [2016-09-29T12:56:35.756000 #3362] DEBUG -- : Initializing EMR jobflow
D, [2016-09-29T12:56:38.446000 #3362] DEBUG -- : EMR jobflow j-8MQ3W0PCDX8H started, waiting for jobflow to complete...
W, [2016-09-29T12:58:39.804000 #3362]  WARN -- : Job failed. 1 tries left...
W, [2016-09-29T12:58:39.805000 #3362]  WARN -- : Bootstrap failure detected, retrying in 250 seconds...
D, [2016-09-29T13:02:49.810000 #3362] DEBUG -- : Initializing EMR jobflow
D, [2016-09-29T13:02:52.258000 #3362] DEBUG -- : EMR jobflow j-2VE9KE7D9PZS5 started, waiting for jobflow to complete...
W, [2016-09-29T13:04:53.741000 #3362]  WARN -- : Job failed. 0 tries left...
F, [2016-09-29T13:04:53.742000 #3362] FATAL -- :

Snowplow::EmrEtlRunner::BootstrapFailureError (EMR jobflow j-2VE9KE7D9PZS5 failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATING [BOOTSTRAP_FAILURE] ~ elapsed time n/a [ - ]
 - 1. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 2. Elasticity Scalding Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
 - 3. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 4. Elasticity S3DistCp Step: Enriched HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 5. Elasticity Scalding Step: Enrich Raw Events: CANCELLED ~ elapsed time n/a [ - ]):
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:469:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `<main>'
    org/jruby/RubyKernel.java:973:in `load'
    uri:classloader:/META-INF/main.rb:1:in `<main>'
    org/jruby/RubyKernel.java:955:in `require'
    uri:classloader:/META-INF/main.rb:1:in `(root)'
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'  

Thanks in advance.


#10

Hi @discourse

Its my 5th day and i am not able to set up snowplow.Can anyone please help me with this issue. It will be a great help. I have tried this in 3 regions but not able to run it single time.

This is the error log for EU region running R75 build and getting the error

ArgumentError (AWS EMR API Error (ValidationException): Size of step parameter length exceeded the maximum allowed.):
    /home/ec2-user/r75/snowplow-emr-etl-runner!/gems/elasticity-6.0.5/lib/elasticity/aws_session.rb:33:in `submit'
    /home/ec2-user/r75/snowplow-emr-etl-runner!/gems/elasticity-6.0.5/lib/elasticity/emr.rb:302:in `run_job_flow'
    /home/ec2-user/r75/snowplow-emr-etl-runner!/gems/elasticity-6.0.5/lib/elasticity/job_flow.rb:141:in `run'
    /home/ec2-user/r75/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:400:in `run'
    /home/ec2-user/r75/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /home/ec2-user/r75/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /home/ec2-user/r75/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `common_method_added'
    /home/ec2-user/r75/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in `run'
    /home/ec2-user/r75/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /home/ec2-user/r75/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /home/ec2-user/r75/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `common_method_added'
    file:/home/ec2-user/r75/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `(root)'
    org/jruby/RubyKernel.java:1091:in `load'
    file:/home/ec2-user/r75/snowplow-emr-etl-runner!/META-INF/main.rb:1:in `(root)'
    org/jruby/RubyKernel.java:1072:in `require'
    file:/home/ec2-user/r75/snowplow-emr-etl-runner!/META-INF/main.rb:1:in `(root)'
    /tmp/jruby5342192161190253599extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)'

Thanks in advance.


#11

Hi,
is it a long running cluster?
Maybe you have reached the maximum number of jobflow steps, errror message sounds like that.
i think 256 are the maximum (https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_RunJobFlow.html).


#12

Hi @ecoron

 I have just uploaded 1 log file(41 KB) to run the emretl runner. How it can create this many cluster.

Thanks!


#13

ok i didn’t read the full thing. you are using ami_version 4.6.1? In EU region ami_version 4.5.0 (that’s what we currently using, and i think it’s the official one) should work fine, rest of config looks ok for me.


#14

receiving same error. Now, I am using ami_version: 4.5.0 and rest is same
hadoop_enrich: 1.8.0 # Version of the Hadoop Enrichment process
hadoop_shred: 0.9.0 # Version of the Hadoop Shredding process
hadoop_elasticsearch: 0.1.0

with r83-bald-eagle release.
Appreciate your help @ecoron .


#15

hmm, just tested the same release for an upgrade some days ago, works fine. But can you check in the aws emr console, if you can see in which step it is failing. it could be some passed arguments or values raises the max allowed length. (https://groups.google.com/forum/#!topic/mrjob/mX00_EElZoY)


#16

The cluster is not launching. So unable to see the logs. The runner is failing before the cluster launched.
My hadoop_enrich & hadoop_shred version are correct @ecoron ?

I am running this
./snowplow-emr-etl-runner --config ~/snowplow-master/3-enrich/emr-etl-runner/config/config.yml.sample --resolver ~/snowplow-master/3-enrich/config/iglu_resolver.json --enrichments ~/snowplow-master/3-enrich/config/enrichments/ --skip elasticsearch,staging

I think there is some configuration mismatch between the build the hadoop enrich , shred jars the ami version, not sure. Waiting for your reply.

Thanks!


#17

Versions are ok, i have the same. what happens if you try to start without the skip options?

./snowplow-emr-etl-runner --config ~/snowplow-master/3-enrich/emr-etl-runner/config/config.yml.sample --resolver ~/snowplow-master/3-enrich/config/iglu_resolver.json --enrichments ~/snowplow-master/3-enrich/config/enrichments/

is there something else customized like in enrichments or resolver?

you should see some console output like
DEBUG – : Staging raw logs…

DEBUG – : Waiting a minute to allow S3 to settle (eventual consistency)
DEBUG – : Initializing EMR jobflow
DEBUG – : EMR jobflow j-XXXXXXXXXXX started, waiting for jobflow to complete…


#18

Hi @deepak - it’s not you, it’s us.

Basically for the Snowplow pipeline to run we have to deploy an array of hosted assets to a public S3 bucket in each AWS region.

Unfortunately for you we hadn’t yet set this up for Mumbai (ap-south-1). I have made a ticket for this now:

And we are now running the sync process. If all goes well, the sync should be complete in about an hour, and you can try again then.

Apologies for the confusion. It looks like you are the first Snowplow user in ap-south-1 - so let us know if you encounter any further problems downstream.


#19

Hi @alex

Now the cluster is launching but the job failed after 5 mins. Error I am getting is :

D, [2016-10-03T07:25:44.074000 #3093] DEBUG -- : Initializing EMR jobflow
D, [2016-10-03T07:25:48.928000 #3093] DEBUG -- : EMR jobflow j-******* started, waiting for jobflow to complete...
F, [2016-10-03T07:39:51.965000 #3093] FATAL -- :

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-3****** failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATED_WITH_ERRORS [STEP_FAILURE] ~ 00:09:17 [2016-10-03 07:29:57 UTC - 2016-10-03 07:39:15 UTC]
 - 1. Elasticity Scalding Step: Enrich Raw Events: FAILED ~ 00:07:42 [2016-10-03 07:29:59 UTC - 2016-10-03 07:37:41 UTC]
 - 2. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 3. Elasticity Scalding Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
 - 4. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 5. Elasticity S3DistCp Step: Enriched HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:475:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `<main>'
    org/jruby/RubyKernel.java:973:in `load'
    uri:classloader:/META-INF/main.rb:1:in `<main>'
    org/jruby/RubyKernel.java:955:in `require'
    uri:classloader:/META-INF/main.rb:1:in `(root)'
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

I have started the same job multiple times and getting the same error. Can you please help.

P.S. : Standard Error I am getting on Emr console:

Exception in thread "main" cascading.flow.FlowException: step failed: (3/3) ...d/run=2016-10-03-07-25-44, with job id: job_1475479683156_0003, please see cluster logs for failure messages
	at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:221)
	at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
	at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
	at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745) 

Thanks in advanced.


#20

Hi @deepak - you’ll need to dig through the logs (there’s a wiki link shared in the error message that should help) to figure out why the job is failing after 7 minutes in Hadoop Enrich. Let us know what you find out!