ArgumentError: AWS EMR API Error (ValidationException): Invalid InstanceProfile: EMR_EC2_DefaultRole

I am using this zip of emr snowplow_emr_r102_afontova_gora.zip

I’m not sure what I might be doing wrong, any help would be greatly appreciated.

I did skip Use EBS-backed instances and set the HD size & Enable connection draining for your Elastic Beanstalk instance & Configure autoscaling settings because my data is so small and this is just for testing for now. If that will make a difference I will configure those tho. But the way it was worded seems like the step I am at should work.

I am getting the following error

D, [2020-01-11T01:31:17.776000 #31639] DEBUG -- : Initializing EMR jobflow
ArgumentError: AWS EMR API Error (ValidationException): Invalid InstanceProfile: EMR_EC2_DefaultRole.
                    submit at uri:classloader:/gems/elasticity-6.0.12/lib/elasticity/aws_session.rb:33
              run_job_flow at uri:classloader:/gems/elasticity-6.0.12/lib/elasticity/emr.rb:302                       run at uri:classloader:/gems/elasticity-6.0.12/lib/elasticity/job_flow.rb:173
                       run at uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:621
                   send_to at uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43
                 call_with at uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76
  block in redefine_method at uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138
                       run at uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:109
                   send_to at uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43
                 call_with at uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76
  block in redefine_method at uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138
                    <main> at uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41   
                      load at org/jruby/RubyKernel.java:979
                    <main> at uri:classloader:/META-INF/main.rb:1
                   require at org/jruby/RubyKernel.java:961
                    (root) at uri:classloader:/META-INF/main.rb:1
                    <main> at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1
ERROR: org.jruby.embed.EvalFailedException: (ArgumentError) AWS EMR API Error (ValidationException): Invalid InstanceProfile: EMR_EC2_DefaultRole.

This is my config.yml

aws:

  # Credentials can be hardcoded or set in environment variables

  access_key_id: 'gfdgs'

  secret_access_key: 'gfds+gfds'

  s3:

    region: 'us-east-2'

    buckets:

      assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket

      jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here

      log: s3://enrich-logs-riel-design

      raw:

        in:

          - s3://elasticbeanstalk-us-east-2-410131593755/resources/environments/logs/publish/e-dpcpvtxxpi

        processing: s3://enrich-processing

        archive: s3://enrich-archive    # e.g. s3://my-archive-bucket/in

      enriched:

        good: s3://enrich-good       # e.g. s3://my-out-bucket/enriched/good

        bad: s3://enrich-bad        # e.g. s3://my-out-bucket/enriched/bad

        errors: s3://enrich-errors     # Leave blank unless continue_on_unexpected_error: set to true below

        archive: s3://enrich-archive-enriched    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched

      shredded:

        good: s3://enrich-shredded/good       # e.g. s3://my-out-bucket/shredded/good

        bad: s3://enrich-shredded/bad        # e.g. s3://my-out-bucket/shredded/bad

        errors: s3://enrich-shredded/errors     # Leave blank unless continue_on_unexpected_error: set to true below

        archive: s3://enrich-shredded/archive   # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded

  emr:

    ami_version: 5.9.0      # Don't change this

    region: 'us-east-2'      # Always set this

    jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles

    service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles

    placement:      # Set this if not running in VPC. Leave blank otherwise

    ec2_subnet_id: subnet-a92da0e5 # Set this if running in VPC. Leave blank otherwise

    ec2_key_name: snowplow-ec2

    bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise

    software:

      hbase:                # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.

      lingual:              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.

    # Adjust your Spark cluster below

    jobflow:

      job_name: 'Snowplow'

      master_instance_type: m1.medium

      core_instance_count: 2

      core_instance_type: m1.medium

      core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.

        volume_size: 100    # Gigabytes

        volume_type: "gp2"

        volume_iops: 400    # Optional. Will only be used if volume_type is "io1"

        ebs_optimized: false # Optional. Will default to true

      task_instance_count: 0 # Increase to use spot instances

      task_instance_type: m1.medium

      task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances

    bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures

    configuration:

      yarn-site:

        yarn.resourcemanager.am.max-attempts: "1"

      spark:

        maximizeResourceAllocation: "true"

    additional_info:        # Optional JSON string for selecting additional features

collectors:

  format: 'clj-tomcat' # Or 'clj-tomcat' for the Clojure Collector, or 'thrift' for Thrift records, or 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs

enrich:

  versions:

    spark_enrich: 1.18.0 # Version of the Spark Enrichment process

  continue_on_unexpected_error: false # Set to 'true' (and set out_errors: above) if you don't want any exceptions thrown from ETL

  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP

storage:

  versions:

    rdb_shredder: 0.13.0        # Version of the Relational Database Shredding process

    rdb_loader: 0.14.0          # Version of the Relational Database Loader app

    hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process

monitoring:

  tags: {} # Name-value pairs describing this job

  logging:

    level: DEBUG # You can optionally switch to INFO for production

  snowplow:

    method: get

    app_id: e-dpcpvtxxpi # e.g. snowplow

    collector: riel-design-email-open.us-east-2.elasticbeanstalk.com # e.g. d3rkrsqld9gmqf.cloudfront.net

Im confused, I had followed these instructions to setup the EC2 Instance https://github.com/snowplow/snowplow/wiki/Setting-up-EC2-instance-for-EmrEtlRunner-and-StorageLoader
Now that link you posted has me delete the default role I setup, but the next steps are about setting up a cluster in aws console, do I need to do this, or do I just need to run through the steps again and delete my EC2 Instance all together?

Hey,
we are receiving exactly the same error but we are running the dataflow runner on Fargate. Did you manage to solve it? Because we are still stuck. Our configs are quite similar:
How can we delete and recreate the role for the EC2 instance Fargate is running on?

{
  "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
  "data": {
    "name": "com.myapp", 
    "logUri": "s3n://logs/",
    "region": "eu-central-1", 
    "credentials": {
      "accessKeyId": "AWS_ACCESS_KEY_ID", 
      "secretAccessKey": "AWS_SECRET_ACCESS_KEY"
    },
    "roles": {
      "jobflow": "EMR_EC2_DefaultRole",
      "service": "EMR_DefaultRole"
    },
    "ec2": { 
      "amiVersion": "4.8.2", 
      "keyName": "snowplow-com-example-key",
      "location": {
        "vpc": {
          "subnetId": "AWS_SUBNET_PUBLIC_ID"
        }
      },
      "instances": {
        "master": {
          "type": "m4.large"
        },
        "core": {
          "type": "m3.xlarge",
          "count": 1
        },
        "task": {
          "type": "m4.large",
          "count": 0,
          "bid": "0.015"
        }
      }
    },
    "tags": [ 
      {
        "key": "client",
        "value": "com.myapp"
      },
      {
        "key": "job",
        "value": "recovery"
      }
    ],
    "bootstrapActionConfigs": [
      {
        "name": "Elasticity Bootstrap Action",
        "scriptBootstrapAction": {
          "path": "s3://snowplow-hosted-assets-eu-central-1/common/emr/snowplow-ami4-bootstrap-0.2.0.sh",
          "args": [ "1.5" ]
        }
      }
    ],
    "applications": [ "Hadoop"]
  }
}

Hey @mgloel, did you manage to fix this? if so how?

Hey @fwahlqvist , yes we did. Before we launch the emr cluster we had to run these commands (in exactly that order) as specified in the link that @mike has provided above:

We are running a bash script in the container which looks like this:

detach_emr_default_roles.sh

#!/bin/bash

if aws iam wait role-exists --role-name EMR_EC2_DefaultRole; then
    aws iam remove-role-from-instance-profile --instance-profile-name EMR_EC2_DefaultRole --role-name EMR_EC2_DefaultRole
    aws iam delete-instance-profile --instance-profile-name EMR_EC2_DefaultRole

    aws iam detach-role-policy --role-name EMR_EC2_DefaultRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceforEC2Role
    aws iam delete-role --role-name EMR_EC2_DefaultRole
fi

if aws iam wait role-exists --role-name EMR_DefaultRole; then
    aws iam detach-role-policy --role-name EMR_DefaultRole --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
    aws iam detach-role-policy --role-name EMR_DefaultRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole
    aws iam delete-role --role-name EMR_DefaultRole
fi

aws emr create-default-roles

Thanks @mgloel, what IAM permission did you have to give the Fargate role to run this?

As even when updating with the above script i get the following error

ValidationException: Invalid InstanceProfile: EMR_EC2_DefaultRole.

I can also see the full json in CloudWatch and that the roles are being created.

EDIT: I tested to give the fargate role full IAM permissions and still get the same error.

Our Fargate role permissions. (our testing instance so some permissions are redundant here :wink: )

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "sqs:*",
                "s3:*",
                "kms:*",
                "kinesis:*",
                "emr:*",
                "ec2:*",
                "dynamodb:*",
                "cloudwatch:*"
            ],
            "Resource": [
                "arn:aws:s3:::sp-shredded-log-bucket/*",
                "arn:aws:s3:::sp-shredded-log-bucket",
                "arn:aws:s3:::sp-shredded/*",
                "arn:aws:s3:::sp-shredded",
                "arn:aws:s3:::sp-loader/*",
                "arn:aws:s3:::sp-loader",
                "arn:aws:kms:eu-west-1:AWS_ACCOUNT_NO:key/KEY_HASH"
            ]
        }
    ]
}

Trust relationships

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::AWS_ACCOUNT_NO:role/EMR_EC2_DefaultRole",
        "Service": [
          "ec2.amazonaws.com",
          "elasticmapreduce.amazonaws.com",
          "ecs-tasks.amazonaws.com",
          "s3.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

Hey @mgloel, thanks,
Am i correct in saying that this is for the execution role and for the task role you use EMR_EC@_DefaultRole or have i got this wrong?

Hey @fwahlqvist ,

the first policy is attached to our TaskRole.
The second assume_role_policy is attached to our TaskExecutionRole.
Best,
M.

Thanks @mgloel for getting back quickly.
strangly enough even withe these changes i still get

Invalid InstanceProfile: EMR_EC2_DefaultRole.\n\tstatus code: 400

I have verified with caller identity in the bash script

aws sts get-caller-identity

And can confirm that its my role

"Arn": "arn:aws:sts::XXXXXXXXXX:assumed-role/snowplow-rdbloader_task/XXXXXXXXXXXXXX"  

The update rdloader-task have the following IAM (for testing )

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "kinesis:DescribeStream",
            "kinesis:ListStreams",
            "kinesis:GetShardIterator",
            "kinesis:GetRecords"
        ],
        "Resource": [
            "*"
        ]
    },
    {
        "Effect": "Allow",
        "Action": [
            "dynamodb:CreateTable",
            "dynamodb:DescribeTable",
            "dynamodb:Scan",
            "dynamodb:GetItem",
            "dynamodb:PutItem",
            "dynamodb:UpdateItem",
            "dynamodb:DeleteItem"
        ],
        "Resource": [
            "*"
        ]
    },
    {
        "Sid": "VisualEditor0",
        "Effect": "Allow",
        "Action": [
            "sqs:DeleteMessage",
            "sqs:SendMessageBatch",
            "sqs:ReceiveMessage",
            "sqs:ListQueueTags",
            "sqs:DeleteMessageBatch",
            "sqs:ChangeMessageVisibilityBatch",
            "sqs:SetQueueAttributes",
            "sqs:GetQueueUrl",
            "sqs:ListQueues",
            "sqs:ChangeMessageVisibility",
            "sqs:SendMessage",
            "sqs:GetQueueAttributes",
            "sqs:ListDeadLetterSourceQueues",
            "sqs:PurgeQueue",
            "sqs:DeleteQueue",
            "sqs:CreateQueue"
        ],
        "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": [
            "iam:RemoveRoleFromInstanceProfile",
            "iam:CreateRole",
            "iam:DetachRolePolicy",
            "iam:DeleteInstanceProfile",
            "iam:GetRole",
            "iam:DeleteRole",
            "iam:AttachRolePolicy",
            "iam:GetPolicy",
            "iam:GetPolicyVersion",
            "iam:GetInstanceProfile",
            "iam:CreateInstanceProfile",
            "iam:AddRoleToInstanceProfile",
            "iam:PassRole",
            "iam:ListInstanceProfiles"
        ],
        "Resource": [
            "*"
        ]
    },
    {
        "Effect": "Allow",
        "Action": [
            "sqs:*",
            "s3:*",
            "kms:*",
            "kinesis:*",
            "emr:*",
            "ec2:*",
            "dynamodb:*",
            "cloudwatch:*"
        ],
        "Resource": [
            "*"
        ]
    }
]
}

The rdb loader task execution role has the following trusted relationships

The identity provider(s) elasticmapreduce.amazonaws.com
The identity provider(s) ecs-tasks.amazonaws.com
The identity provider(s) ec2.amazonaws.com
The identity provider(s) s3.amazonaws.com
{{ACOUNT IDENTIFIER}}

and the default
AmazonECSTaskExecutionRolePolicy

Feels like i am missing something obvious so any insight would be welcome.

Strange. What does your emr config look like?

(Once again thanks for getting back to me)

Cluster config looks like this

{
  "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
  "data": {
    "name": "dataflow-runner - RDB",
    "logUri": "{{s3_enriched_bucket}}",
    "region": "{{aws_region}}",
    "credentials": {
      "accessKeyId": "default",
      "secretAccessKey": "default"
    },
    "roles": {
      "jobflow": "EMR_EC2_DefaultRole",
      "service": "EMR_DefaultRole"
    },
    "ec2": {
      "amiVersion": "4.8.2",
      "keyName": "XXXXX",
      "location": {
        "vpc": {
          "subnetId": "{{subnets}}"
        }
      },
      "instances": {
        "master": {
          "type": "m4.large",
          "ebsConfiguration": {
            "ebsOptimized": true,
            "ebsBlockDeviceConfigs": [
              {
                "volumesPerInstance": 12,
                "volumeSpecification": {
                  "iops": 8,
                  "sizeInGB": 4,
                  "volumeType": "gp2"
                }
              }
            ]
          }
        },
        "core": {
          "type": "m4.large",
          "count": 1
        },
        "task": {
          "type": "m4.large",
          "count": 0,
          "bid": "0.0015"
        }
      }
    },
    "tags": [
      {
        "key": "client",
        "value": "com.engineering"
      },
      {
        "key": "job",
        "value": "main"
      }
    ],
    "bootstrapActionConfigs": [
      {
        "name": "Elasticity Bootstrap Action",
        "scriptBootstrapAction": {
          "path": "s3://snowplow-hosted-assets-eu-west-2/common/emr/snowplow-ami4-bootstrap-0.2.0.sh",
          "args": ["1.5"]
        }
      }
    ],
    "configurations": [
      {
        "classification": "core-site",
        "properties": {
          "Io.file.buffer.size": "65536"
        }
      },
      {
        "classification": "mapred-site",
        "properties": {
          "Mapreduce.user.classpath.first": "true"
        }
      }
    ],
    "applications": ["Hadoop", "Spark"]
  }
}

Ok. We had some problem running it on amiVersion “4.8.2.” that’s why we switched to 6.1.0

This is how it looks for us:

{
  "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
  "data": {
    "name": "com.myapp", 
    "logUri": "LOGURI",
    "region": "eu-west-1", 
    "credentials": {
      "accessKeyId": "AWS_ACCESS_KEY_ID", 
      "secretAccessKey": "AWS_SECRET_ACCESS_KEY"
    },
    "roles": {
      "jobflow": "EMR_EC2_DefaultRole",
      "service": "EMR_DefaultRole"
    },
    "ec2": {
      "amiVersion": "6.1.0",
      "instances": {
          "core": {
              "count": 1,
              "type": "r5.xlarge"
          },
          "master": {
              "ebsConfiguration": {
                  "ebsBlockDeviceConfigs": [],
                  "ebsOptimized": true
              },
              "type": "m4.large"
          },
          "task": {
              "bid": "0.015",
              "count": 0,
              "type": "m4.large"
          }
      },
      "keyName": "EMR_ECS_KEY_PAIR",
      "location": {
          "vpc": {
              "subnetId": "AWS_SUBNET_PUBLIC_ID"
          }
      }
    },
    "tags": [ 
      {
        "key": "client",
        "value": "com.myapp"
      },
      {
        "key": "job",
        "value": "main"
      }
    ],
    "bootstrapActionConfigs": [],
    "configurations": [
      {
        "classification": "spark",
        "configurations": [],
        "properties": {
            "maximizeResourceAllocation": "false"
        }
      },
      {
        "classification": "spark-defaults",
        "configurations": [],
        "properties": {
            "spark.default.parallelism": "8",
            "spark.driver.cores": "1",
            "spark.driver.memory": "9G",
            "spark.dynamicAllocation.enabled": "false",
            "spark.executor.cores": "1",
            "spark.executor.instances": "2",
            "spark.executor.memory": "9G",
            "spark.yarn.driver.memoryOverhead": "1024",
            "spark.yarn.executor.memoryOverhead": "1024"
        }
      },
      {
        "classification": "yarn-site",
        "configurations": [],
        "properties": {
            "yarn.nodemanager.resource.memory-mb": "30720",
            "yarn.nodemanager.vmem-check-enabled": "false",
            "yarn.scheduler.maximum-allocation-mb": "30720"
        }
      }
    ],
    "applications": [ "Hadoop", "Spark" ]
  }
}