EmrEtlRunner throwing 403 error at Staging phase

Running the emr-etl-runner throws the following error :

D, [2016-06-24T00:03:54.356000 #3250] DEBUG -- : Staging raw logs...
F, [2016-06-24T00:03:57.068000 #3250] FATAL -- :

Excon::Errors::Forbidden (Expected(200) <=> Actual(403 Forbidden)
excon.error.response
  :body          => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>InvalidAccessKeyId</Code><Message>The AWS Access Key Id you provided does not exist in our records.</Message><AWSAccessKeyId>XXXXXXX</AWSAccessKeyId><RequestId>4BA98CF755881B32</RequestId><HostId>hCyHyLOC1LpzLB7bDsk/X34ZDkI3lsGcuSTV5r/ZvrhWSuxqxkF5W6Kt0R3RTNJaXpaUD462+Wc=</HostId></Error>"
/home/ubuntu/downloads/snowplow-emr-etl-runner!/gems/excon-0.45.3/lib/excon/middlewares/expects.rb:6:in `response_call'
    /home/ubuntu/downloads/snowplow-emr-etl-runner!/gems/excon-0.45.3/lib/excon/middlewares/response_parser.rb:8:in `response_call'

So it’s a fact this is a permission issue. I’ve created all s3 buckets with the same IAM role used to fire up my ec2 instance on which I’m testing the emr-etl-runner. What am I missing ?

Here is my config.file

aws:
  access_key_id:xxxxxx # Credentials can be hardcoded or set in environment variables
  secret_access_key: xxxxxx
  s3:
    region: us-east-1
    buckets:
      assets: s3://snowplow-hosted-assets 
      jsonpath_assets: 
      log: s3://a1-snowplow-jars/emretl-runner/logs/
      raw:
        in:                  
          - s3://a1-snowplow-jars/emretl-runner/in/        
        processing: s3://a1-snowplow-jars/emretl-runner/processing/
        archive: s3://a1-snowplow-jars/emretl-runner/raw  
      enriched:
        good: s3://a1-snowplow-jars/emretl-runner/enriched/good/     
        bad: s3://a1-snowplow-jars/emretl-runner/enriched/bad/       
        errors: s3://a1-snowplow-jars/emretl-runner/enriched/errors/     
        archive: s3://a1-snowplow-jars/emretl-runner/enriched/archive/    
      shredded:
        good: s3://a1-snowplow-jars/emretl-runner/shredded/good/    
        bad: s3://a1-snowplow-jars/emretl-runner/shredded/bad/       
        errors: s3://a1-snowplow-jars/emretl-runner/shredded/errors/  
        archive: s3://a1-snowplow-jars/emretl-runner/shredded/archive/   
  emr:
    ami_version: 4.5.0
    region: us-east-1       
    jobflow_role: ecsInstanceRoleSnowPlow 
    service_role: ecsInstanceRoleSnowPlow    
    placement:      
    ec2_subnet_id: subnet-90a1c2bb 
    ec2_key_name: sai-kats-box
    bootstrap: []         
    software:
      hbase:           
      lingual:          
    jobflow:
      master_instance_type: m1.medium
      core_instance_count: 2
      core_instance_type: m1.medium
      task_instance_count: 0
      task_instance_type: m1.medium
      task_instance_bid: 0.015 
    additional_info:      
    bootstrap_failure_tries: 3 
  format: clj-tomcat
enrich:
  job_name: potomac_snowplow_etl 
    hadoop_enrich: 1.7.0
    hadoop_shred: 0.9.0 
    #hadoop_elasticsearch: 0.1.0 
  continue_on_unexpected_error: true 
  output_compression: NONE 
storage:
  download:
    folder: 
targets: []
monitoring:
  tags: {} 
  logging:
    level: DEBUG 
  snowplow:
    method: get
    app_id: snowplow 
    collector: d2bpvzh93js6np.cloudfront.net 

Hi @carlitros,

This kind of error does creep in during copying files from/to s3 buckets occasionally. You might get it even after a successful copy of a few files (that is midway of copying the batch of files). In your case, however, you got the error right from the start.

  • Does your orchestrating box (ec2 instance) is located in the same region you specified for your buckets (that is us-east-1)?
  • Have you verified the AWS Access Key Id is indeed correct and available in that region?
  • Have you tried to rerun the job?

Regards,
Ihor

Did this happen to occur only shortly (<1-2 minutes) after creating the S3/IAM roles for the first time? I’ve seen some instances in which the IAM policies aren’t immediately applied to objects globally (particularly on S3) possibly due to the mixed consistency nature of S3.