Error : ArgumentError (AWS EMR API Error (AccessDeniedException)

Hi there,

I am receiving following error while try running in AWS:

_./snowplow-emr-etl-runner --config config/config.yml --resolver resolver.json_
_./snowplow-storage-loader --config config/config.yml --skip analyze_

    (t0)    MOVE canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-15.a6f05099.gz -> canvas-snowplow-logs/processing/EFGYHUV1JKTMS.2017-02-13-15.a6f05099.us-west-2.raw_logs.gz
      x canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-14.d5ea3559.gz      x canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-14.df253f74.gz

      +-> canvas-snowplow-logs/processing/EFGYHUV1JKTMS.2017-02-13-15.2cbcb18c.us-west-2.raw_logs.gz
      +-> canvas-snowplow-logs/processing/EFGYHUV1JKTMS.2017-02-13-14.f27abb33.us-west-2.raw_logs.gz
      +-> canvas-snowplow-logs/processing/EFGYHUV1JKTMS.2017-02-13-15.33b3439d.us-west-2.raw_logs.gz
      +-> canvas-snowplow-logs/processing/EFGYHUV1JKTMS.2017-02-13-15.77d4a421.us-west-2.raw_logs.gz
      x canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-15.2cbcb18c.gz
      x canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-14.f27abb33.gz
      x canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-15.33b3439d.gz
      +-> canvas-snowplow-logs/processing/EFGYHUV1JKTMS.2017-02-13-15.9e165f54.us-west-2.raw_logs.gz
      +-> canvas-snowplow-logs/processing/EFGYHUV1JKTMS.2017-02-13-15.a6f05099.us-west-2.raw_logs.gz
      x canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-15.77d4a421.gz
      x canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-15.9e165f54.gz
      x canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-15.a6f05099.gz
      +-> canvas-snowplow-logs/processing/EFGYHUV1JKTMS.2017-02-13-15.0987d1f8.us-west-2.raw_logs.gz
      x canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-15.0987d1f8.gz
      +-> canvas-snowplow-logs/processing/EFGYHUV1JKTMS.2017-02-13-14.f25ba763.us-west-2.raw_logs.gz
      x canvas-snowplow-logs/raw_logs/EFGYHUV1JKTMS.2017-02-13-14.f25ba763.gz
    D, [2017-02-13T16:06:00.290000 #30696] DEBUG -- : Waiting a minute to allow S3 to settle (eventual consistency)
    D, [2017-02-13T16:07:00.296000 #30696] DEBUG -- : Initializing EMR jobflow
    F, [2017-02-13T16:07:01.377000 #30696] FATAL -- : 

        ArgumentError (AWS EMR API Error (AccessDeniedException): ):
            /var/lib/jenkins/workspace/Canvas-Page-Load-Time-Snowplow/snowplow/snowplow-emr-etl-runner!/gems/elasticity-6.0.7/lib/elasticity/aws_session.rb:33:in `submit'
            /var/lib/jenkins/workspace/Canvas-Page-Load-Time-Snowplow/snowplow/snowplow-emr-etl-runner!/gems/elasticity-6.0.7/lib/elasticity/emr.rb:302:in `run_job_flow'
            /var/lib/jenkins/workspace/Canvas-Page-Load-Time-Snowplow/snowplow/snowplow-emr-etl-runner!/gems/elasticity-6.0.7/lib/elasticity/job_flow.rb:151:in `run'

I am using the following config:

aws:
  access_key_id:XXXX
  secret_access_key: XXXXXX
  s3:
    region: us-west-2
    buckets:
      assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
      jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
      log: s3://canvas-snowplow-logs/etl-logs
      raw:
        in:
        - s3://canvas-snowplow-logs/raw_logs              # Multiple in buckets are permitted
        processing: s3://canvas-snowplow-logs/processing
        archive: s3://canvas-snowplow-logs/archive   # e.g. s3://my-archive-bucket/in
      enriched:
        good: s3://canvas-snowplow-logs/enriched/good      # e.g. s3://my-out-bucket/enriched/good
        bad: s3://canvas-snowplow-logs/enriched/bad        # e.g. s3://my-out-bucket/enriched/bad
        errors: s3://canvas-snowplow-logs/enriched/errors     # Leave blank unless continue_on_unexpected_error: set to true below
        archive: s3://canvas-snowplow-logs/enriched/archive   # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
      shredded:
        good: s3://canvas-snowplow-logs/shredded/good       # e.g. s3://my-out-bucket/shredded/good
        bad: s3://canvas-snowplow-logs/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
        errors: s3://canvas-snowplow-logs/shredded/errors     # Leave blank unless continue_on_unexpected_error: set to true below
        archive: s3://canvas-snowplow-logs/shredded/archive    # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
    ami_version: 3.6.0      # Don't change this
    region: us-west-2       # Always set this
    jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
    service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
    placement: us-west-2a     # Set this if not running in VPC. Leave blank otherwise
    ec2_subnet_id: # Set this if running in VPC. Leave blank otherwise
    ec2_key_name: canvasSnowplowAnalytics
    bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
    software:
      hbase:              # To launch on cluster, provide version, "0.92.0", keep quotes
      lingual: "1.1"             # To launch on cluster, provide version, "1.1", keep quotes
    # Adjust your Hadoop cluster below
    jobflow:
      master_instance_type: m1.medium
      core_instance_count: 2
      core_instance_type: m1.medium
      task_instance_count: 0 # Increase to use spot instances
      task_instance_type: m1.medium
      task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
    bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
collectors:
  format: cloudfront # Or 'clj-tomcat' for the Clojure Collector, or 'thrift' for Thrift records, or 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs
enrich:
  job_name: Snowplow canvas ETL # Give your job a name
  versions:
    hadoop_enrich: 1.5.1 # Version of the Hadoop Enrichment process
    hadoop_shred: 0.7.0 # Version of the Hadoop Shredding process
    hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
  continue_on_unexpected_error: false # Set to 'true' (and set out_errors: above) if you don't want any exceptions thrown from ETL
  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
  download:
    folder: # Postgres-only config option. Where to store the downloaded files. Leave blank for Redshift
  targets:
    - name: "Canvas snowplow database"
      type: redshift
      host: 
      database: logs 
      port: 5439 
      table: atomic.events
      username:
      password: 
      maxerror: 1 # Stop loading on first error, or increase to permit more load errors
      comprows: 200000 # Default for a 1 XL node cluster. Not used unless --include compupdate specified
      ssl_mode: disable
monitoring:

Any idea? Let me know if more information is needed.

this is the step, where it tries to start the EMR cluster, so you forgot to set some permissions, you can look at https://github.com/snowplow/snowplow/wiki/Setup-IAM-permissions-for-users-installing-Snowplow

1 Like