EmrEtlRunner ArgumentError (AWS EMR API Error (ValidationException)


#1

I am receiving the following error while trying to run in AWS us-west-2:

./snowplow-emr-etl-runner --config snowplow-config.yml --resolver iglu_resolver.json
D, [2016-12-08T22:49:59.361000 #3628] DEBUG -- : Staging raw logs...
  moving files from s3://elasticbeanstalk-us-west-2-513226749779/resources/environments/logs/publish/e-ikjchvqq49/ to s3://production-snowplow/processing/
(t0)    MOVE elasticbeanstalk-us-west-2-513226749779/resources/environments/logs/publish/e-ikjchvqq49/i-5a79a3cf/var_log_tomcat8_rotated_localhost_access_log.2016-08-17-23.us-west-2.i-5a79a3cf.txt.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.gz -> production-snowplow/processing/var_log_tomcat8_rotated_localhost_access_log.2016-08-17-23.us-west-2.i-5a79a3cf.txt.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.gz
      +-> production-snowplow/processing/var_log_tomcat8_rotated_localhost_access_log.2016-08-17-23.us-west-2.i-5a79a3cf.txt.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.gz
      x elasticbeanstalk-us-west-2-513226749779/resources/environments/logs/publish/e-ikjchvqq49/i-5a79a3cf/var_log_tomcat8_rotated_localhost_access_log.2016-08-17-23.us-west-2.i-5a79a3cf.txt.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.us-west-2.i-5a79a3cf.gz
D, [2016-12-08T22:50:49.408000 #3628] DEBUG -- : Waiting a minute to allow S3 to settle (eventual consistency)
D, [2016-12-08T22:51:49.415000 #3628] DEBUG -- : Initializing EMR jobflow
F, [2016-12-08T22:51:51.192000 #3628] FATAL -- : 

**ArgumentError (AWS EMR API Error (ValidationException): The supplied bootstrap action(s): 'Elasticity Bootstrap Action' are not supported by release 'emr-4.5.0'.):**
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/elasticity-6.0.7/lib/elasticity/aws_session.rb:33:in `submit'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/elasticity-6.0.7/lib/elasticity/emr.rb:302:in `run_job_flow'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/elasticity-6.0.7/lib/elasticity/job_flow.rb:151:in `run'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:445:in `run'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `common_method_added'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in `run'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /var/lib/jenkins/ops/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `common_method_added'
    file:/var/lib/jenkins/ops/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `(root)'
    org/jruby/RubyKernel.java:1091:in `load'
    file:/var/lib/jenkins/ops/snowplow-emr-etl-runner!/META-INF/main.rb:1:in `(root)'
    org/jruby/RubyKernel.java:1072:in `require'
    file:/var/lib/jenkins/ops/snowplow-emr-etl-runner!/META-INF/main.rb:1:in `(root)'
    /tmp/jruby6178060773063677939extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)'

I am using the following config:

    aws:
      # Credentials can be hardcoded or set in environment variables
      access_key_id: <%= ENV['AWS_SNOWPLOW_ACCESS_KEY'] %>
      secret_access_key: <%= ENV['AWS_SNOWPLOW_SECRET_KEY'] %>
      s3:
        region: "us-west-2"
        buckets:
          assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
          jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
          log: s3://production-snowplow/logs
          raw:
            in:                  # Multiple in buckets are permitted
              - s3://elasticbeanstalk-us-west-2-513226749779/resources/environments/logs/publish/e-ikjchvqq49         # e.g. s3://my-in-bucket
            processing: s3://production-snowplow/processing
            archive: s3://production-snowplow/archive    # e.g. s3://my-archive-bucket/raw
          enriched:
            good: s3://production-snowplow/enriched/good       # e.g. s3://my-out-bucket/enriched/good
            bad: s3://production-snowplow/enriched/bad       # e.g. s3://my-out-bucket/enriched/bad
            errors:      # Leave blank unless :continue_on_unexpected_error: set to true below
            archive: s3://production-snowplow/enriched   # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
          shredded:
            good: s3://production-snowplow/shredded/good        # e.g. s3://my-out-bucket/shredded/good
            bad: s3://production-snowplow/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
            errors: ADD HERE     # Leave blank unless :continue_on_unexpected_error: set to true below
            archive: s3://production-snowplow/shredded   # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
      emr:
        ami_version: 4.5.0
        region: "us-west-2"       # Always set this
        jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
        service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
        placement: "us-west-2a"     # Set this if not running in VPC. Leave blank otherwise
        ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
        ec2_key_name: opskey.pem
        bootstrap:         # Set this to specify custom boostrap actions. Leave empty otherwise
        software:
          hbase: "0.92.0"            # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
          lingual: "1.1"             # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
        # Adjust your Hadoop cluster below
        jobflow:
          master_instance_type: m1.medium
          core_instance_count: 2
          core_instance_type: m1.medium
          task_instance_count: 0 # Increase to use spot instances
          task_instance_type: m1.medium
          task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
        bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
        additional_info:        # Optional JSON string for selecting additional features
    collectors:
      format: clj-tomcat # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
    enrich:
      job_name: Snowplow ETL # Give your job a name
      versions:
        hadoop_enrich: 1.8.0 # Version of the Hadoop Enrichment process
        hadoop_shred: 0.9.0 # Version of the Hadoop Shredding process
        hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
      continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
      output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
    storage:
      download:
        folder: # Postgres-only config option. Where to store the downloaded files. Leave blank for Redshift
      targets:
        - name: "Elasticsearch"
          type: elasticsearch
          host: ec2-54-189-166-116.us-west-2.compute.amazonaws.com # The Elasticsearch endpoint
          database: snowplow # Name of index
          port: 9200 # Default Elasticsearch port - change to 80 if using Amazon Elasticsearch Service
          sources: # Leave blank to write the bad rows created in this run to Elasticsearch, or explicitly provide an array of bad row buckets like ["s3://my-enriched-bucket/bad/run=2015-10-06-15-25-53"]
          ssl_mode: # Not required for Elasticsearch
          table: ADD HERE # Name of type
          username: # Not required for Elasticsearch
          password: # Not required for Elasticsearch
          es_nodes_wan_only: false # Set to true if using Amazon Elasticsearch Service
          maxerror: # Not required for Elasticsearch
          comprows: # Not required for Elasticsearch
    monitoring:
      tags: {} # Name-value pairs describing this job
      logging:
        level: DEBUG # You can optionally switch to INFO for production

#2

Anyone?


#3

Hi @jlmoody - what version of EmrEtlRunner are you using?


#4

I am using 77 which is the one linked in the documentation. I have discovered the link to additional versions in the documentation so I am going to try the latest release now.


#5

Hi @jlmoody - could you share the documentation link you are referencing, so we can fix it? R77 is slightly old now.


#6

Section:
3. Installation

  1. Installation

We host EmrEtlRunner on the distribution platform JFrog Bintray. You can get a copy of it as shown below.

Note: Please, follow this link if you wish to get a different version of the runner. The distribution name follows the pattern snowplow_emr_{{RELEASE_VERSION}}.zip.

$ wget http://dl.bintray.com/snowplow/snowplow-generic/snowplow_emr_r77_great_auk.zip
The archive contains both EmrEtlRunner and StorageLoader. Unzip the archive:

$ unzip snowplow_emr_r77_great_auk.zip
You will see two files snowplow-emr-etl-runner and snowplow-storage-loader where the first one is the actual EmrEtlRunner.


#7

Thanks ticket added:

https://github.com/snowplow/snowplow/issues/2997


#9

Hello, I’m trying to use EmrEtlRunner r87 and am experiencing this same error. Any idea why that would be?


#10

Hi Bryce,
My issue boiled down to being a config issue. My storage target is
Elasticsearch and so I needed to remove the other options under storage
targets. Hope that helps. If not, I would be happy to take a look at your
config file.


#11

Thanks, Jason! I figured out how to make this go away by removing the HBase and Lingual version numbers from config; I don’t probably need them currently I guess. Thank you for your reply and offer to help.