Validation error on dataflow runner up

Hi Team, I am getting a validation error on running ./dataflow-runner up --emr-config cluster.json command


  "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
  "data": {
    "name": "RDB Shredder",
    "logUri": "s3://rr-snowplow-events-sample-app-dev/emr-logs/",
    "credentials": {
      "accessKeyId": "xxxxxxxxxxxxxxxxxxxxxxxxx",
      "secretAccessKey": "xxxxxxxxxxxxxxxxxxxxxxxxxx"
    "roles": {
      "jobflow": "EMR_EC2_DefaultRole",
      "service": "EMR_DefaultRole"
    "ec2": {
      "amiVersion": "6.2.0",
      "keyName": "snowplow_dev.pem",
      "location": {
        "vpc": {
          "subnetId": "subnet-xxxxxxxx"
      "instances": {
        "master": {
          "type": "m4.large",
          "ebsConfiguration": {
            "ebsOptimized": true,
            "ebsBlockDeviceConfigs": [

        "core": {
          "type": "r4.xlarge",
          "count": 1
        "task": {
          "type": "m4.large",
          "count": 0,
          "bid": "0.015"
    "tags": [ ],
    "bootstrapActionConfigs": [ ],
    "configurations": [




    "applications": [ "Hadoop", "Spark" ]

@ihor any idea, what could be wrong?

@Tejas_Behra , your config misses bootstraping action, which is depicted here. Also, you have “empty” "configurations": [], which I do not see in the sample. What is the actual error message? Could you try following the config in the sample in terms of the structure more closely?

Hi @ihor , getting following error now (using the same config as you mentioned earlier) -

ERRO[0000] ValidationException: EBS optimization is not supported for instance type m1.medium.
        status code: 400, request id: 9d5649ae-621f-4444-ab6f-c44277098347
ValidationException: EBS optimization is not supported for instance type m1.medium.
        status code: 400, request id: 9d5649ae-621f-4444-ab6f-c44277098347

Based on the above error I changed the config as following but still getting Validation error & its failing to bootstrap

  "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
  "data": {
    "name": "dataflow-runner - cluster name",
    "logUri": "s3://rr-snowplow-events-sample-app-dev/emr-logs/",
    "region": "us-east-1",
    "credentials": {
      "accessKeyId": "xxxxxxxxxxxxx",
      "secretAccessKey": "RFxxxxxxxxxxxxxxZtcqgEz"
    "roles": {
      "jobflow": "EMR_EC2_DefaultRole",
      "service": "EMR_DefaultRole"
    "ec2": {
      "amiVersion": "4.5.0",
      "keyName": "snowplow_dev.pem",
      "location": {
        "vpc": {
          "subnetId": "subnet-xxxxxxxx"
      "instances": {
        "master": {
          "type": "m1.medium",
          "count": 1
        "core": {
               "type": "m4.xlarge",
               "count": 3,
               "ebsConfiguration": {
                 "ebsOptimized": true,
                 "ebsBlockDeviceConfigs": [
                     "volumesPerInstance": 1,
                     "volumeSpecification": {
                       "iops": 1500,
                       "sizeInGB": 100,
                       "volumeType": "io1"
        "task": {
          "type": "m1.medium",
          "count": 0,
          "bid": "0.015"
    "tags": [
        "key": "client",
        "value": ""
        "key": "job",
        "value": "main"
    "bootstrapActionConfigs": [
        "name": "Elasticity Bootstrap Action",
        "scriptBootstrapAction": {
          "path": "s3://snowplow-hosted-assets-us-east-1/common/emr/",
          "args": [ "1.5" ]
    "configurations": [
        "classification": "core-site",
        "properties": {
          "Io.file.buffer.size": "65536"
        "classification": "mapred-site",
        "properties": {
          "Mapreduce.user.classpath.first": "true"
    "applications": [ "Hadoop", "Spark" ]

INFO[0000] Launching EMR cluster with name 'dataflow-runner - cluster name'...
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
ERRO[0030] EMR cluster failed to launch with state TERMINATING
EMR cluster failed to launch with state TERMINATING

@Tejas_Behra , m1.medium is a very old generation. Could you replace it with m4.large? Also, I might have been too direct but I meant structure of the sample config, not its values. The amiVersion looks outdated as well. Could you replace it with 6.1.0 and leave bootstrapping with an empty array, []?

@ihor still getting the same error -

INFO[0000] Launching EMR cluster with name 'dataflow-runner - cluster name'...
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
ERRO[0030] EMR cluster failed to launch with state TERMINATING
EMR cluster failed to launch with state TERMINATING

Hi @ihor any help on this, I am still getting a validation error

ubuntu@ip-10-0-0-157:~/rr-snowplow/upgrade/modules/dataflow_runner$ ./dataflow-runner up --emr-config cluster_2.json

INFO[0000] Launching EMR cluster with name ‘RDB Shredder’…
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
INFO[0030] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
INFO[0060] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
ERRO[0090] EMR cluster failed to launch with state TERMINATING
EMR cluster failed to launch with state TERMINATING

@Tejas_Behra have you tried running with debug log level to see if you get any further details as to what might be failing?

Something like

dataflow-runner [global options] command [command options] [arguments...]

./dataflow-runner --log-level=debug up --emr-config=cluster_2.json
1 Like

@ian.a Tried with debug log setting but got the same error -
ubuntu@ip-10-0-0-157:~/rr-snowplow/upgrade/modules/dataflow_runner$ ./dataflow-runner --log-level=debug up --emr-config cluster_2.json
INFO[0000] Launching EMR cluster with name ‘RDB Shredder’…
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
INFO[0030] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
INFO[0060] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
ERRO[0090] EMR cluster failed to launch with state TERMINATING
EMR cluster failed to launch with state TERMINATING

Can you paste your updated cluster_2.json config here?

Here it is

  "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
  "data": {
    "name": "RDB Shredder",
    "logUri": "s3://rr-snowplow-events-sample-app-dev/emr-logs/",
    "credentials": {
    "roles": {
      "jobflow": "EMR_EC2_DefaultRole",
      "service": "EMR_DefaultRole"
    "ec2": {
      "amiVersion": "6.2.0",
      "keyName": "snowplow_dev.pem",
      "location": {
        "vpc": {
          "subnetId": "subnet-XXXXXXX"
      "instances": {
        "master": {
          "type": "m4.large",
          "ebsConfiguration": {
            "ebsOptimized": true,
            "ebsBlockDeviceConfigs": [

        "core": {
          "type": "r4.xlarge",
          "count": 1
        "task": {
          "type": "m4.large",
          "count": 0,
          "bid": "0.015"
    "tags": [ ],
    "bootstrapActionConfigs": [
         "name": "Elasticity Bootstrap Action",
         "scriptBootstrapAction": {
                "path": "s3://snowplow-hosted-assets-us-east-1/common/emr/",
                "args": [ "1.5" ]

    "configurations": [




    "applications": [ "Hadoop", "Spark" ]

This mostly looks correct as far as I can tell.

Is that subnet attached to a VPC with available IP addresses etc and the the two roles you’ve specified exist?

@mike just checked the subnet is attached to VPC and its public. Also, the two roles also exist