Validation error on dataflow runner up

Tejas_Behra · August 10, 2021, 9:26pm

Hi Team, I am getting a validation error on running ./dataflow-runner up --emr-config cluster.json command

cluster.json:

{
  "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
  "data": {
    "name": "RDB Shredder",
    "logUri": "s3://rr-snowplow-events-sample-app-dev/emr-logs/",
    "region":"us-east-1",
    "credentials": {
      "accessKeyId": "xxxxxxxxxxxxxxxxxxxxxxxxx",
      "secretAccessKey": "xxxxxxxxxxxxxxxxxxxxxxxxxx"
    },
    "roles": {
      "jobflow": "EMR_EC2_DefaultRole",
      "service": "EMR_DefaultRole"
    },
    "ec2": {
      "amiVersion": "6.2.0",
      "keyName": "snowplow_dev.pem",
      "location": {
        "vpc": {
          "subnetId": "subnet-xxxxxxxx"
        }
      },
      "instances": {
        "master": {
          "type": "m4.large",
          "ebsConfiguration": {
            "ebsOptimized": true,
            "ebsBlockDeviceConfigs": [

            ]
          }
        },
        "core": {
          "type": "r4.xlarge",
          "count": 1
        },
        "task": {
          "type": "m4.large",
          "count": 0,
          "bid": "0.015"
        }
      }
    },
    "tags": [ ],
    "bootstrapActionConfigs": [ ],
    "configurations": [
      {
         "classification":"core-site",
         "properties":{
            "Io.file.buffer.size":"65536"
         },
         "configurations":[

         ]
      },
      {
         "classification":"yarn-site",
         "properties":{
            "yarn.nodemanager.resource.memory-mb":"57344",
            "yarn.scheduler.maximum-allocation-mb":"57344",
            "yarn.nodemanager.vmem-check-enabled":"false"
         },
 "configurations":[

         ]
      },
      {
         "classification":"spark",
         "properties":{
            "maximizeResourceAllocation":"false"
         },
         "configurations":[

         ]
      },
      {
         "classification":"spark-defaults",
         "properties":{
            "spark.executor.memory":"7G",
            "spark.driver.memory":"7G",
            "spark.driver.cores":"3",
            "spark.yarn.driver.memoryOverhead":"1024",
            "spark.default.parallelism":"24",
            "spark.executor.cores":"1",
            "spark.executor.instances":"6",
            "spark.yarn.executor.memoryOverhead":"1024",
            "spark.dynamicAllocation.enabled":"false"
         },
         "configurations":[

         ]
      }
   ],
    "applications": [ "Hadoop", "Spark" ]
  }
}

Tejas_Behra · August 11, 2021, 2:02pm

@ihor any idea, what could be wrong?

ihor · August 11, 2021, 8:47pm

@Tejas_Behra , your config misses bootstraping action, which is depicted here. Also, you have “empty” "configurations": [], which I do not see in the sample. What is the actual error message? Could you try following the config in the sample in terms of the structure more closely?

Tejas_Behra · August 12, 2021, 1:35pm

Hi @ihor , getting following error now (using the same config as you mentioned earlier) -

ERRO[0000] ValidationException: EBS optimization is not supported for instance type m1.medium.
        status code: 400, request id: 9d5649ae-621f-4444-ab6f-c44277098347
ValidationException: EBS optimization is not supported for instance type m1.medium.
        status code: 400, request id: 9d5649ae-621f-4444-ab6f-c44277098347

Based on the above error I changed the config as following but still getting Validation error & its failing to bootstrap

{
  "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
  "data": {
    "name": "dataflow-runner - cluster name",
    "logUri": "s3://rr-snowplow-events-sample-app-dev/emr-logs/",
    "region": "us-east-1",
    "credentials": {
      "accessKeyId": "xxxxxxxxxxxxx",
      "secretAccessKey": "RFxxxxxxxxxxxxxxZtcqgEz"
    },
    "roles": {
      "jobflow": "EMR_EC2_DefaultRole",
      "service": "EMR_DefaultRole"
    },
    "ec2": {
      "amiVersion": "4.5.0",
      "keyName": "snowplow_dev.pem",
      "location": {
        "vpc": {
          "subnetId": "subnet-xxxxxxxx"
        }
      },
      "instances": {
        "master": {
          "type": "m1.medium",
          "count": 1
        },
        "core": {
               "type": "m4.xlarge",
               "count": 3,
               "ebsConfiguration": {
                 "ebsOptimized": true,
                 "ebsBlockDeviceConfigs": [
                   {
                     "volumesPerInstance": 1,
                     "volumeSpecification": {
                       "iops": 1500,
                       "sizeInGB": 100,
                       "volumeType": "io1"
                     }
                   }
                 ]
               }
             },
        "task": {
          "type": "m1.medium",
          "count": 0,
          "bid": "0.015"
        }
      }
    },
    "tags": [
      {
        "key": "client",
        "value": "com.engineering"
      },
      {
        "key": "job",
        "value": "main"
      }
    ],
    "bootstrapActionConfigs": [
{
        "name": "Elasticity Bootstrap Action",
        "scriptBootstrapAction": {
          "path": "s3://snowplow-hosted-assets-us-east-1/common/emr/snowplow-ami4-bootstrap-0.2.0.sh",
          "args": [ "1.5" ]
        }
      }
    ],
    "configurations": [
      {
        "classification": "core-site",
        "properties": {
          "Io.file.buffer.size": "65536"
        }
      },
      {
        "classification": "mapred-site",
        "properties": {
          "Mapreduce.user.classpath.first": "true"
        }
      }
    ],
    "applications": [ "Hadoop", "Spark" ]
  }
}

INFO[0000] Launching EMR cluster with name 'dataflow-runner - cluster name'...
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
ERRO[0030] EMR cluster failed to launch with state TERMINATING
EMR cluster failed to launch with state TERMINATING

ihor · August 12, 2021, 5:16pm

@Tejas_Behra , m1.medium is a very old generation. Could you replace it with m4.large? Also, I might have been too direct but I meant structure of the sample config, not its values. The amiVersion looks outdated as well. Could you replace it with 6.1.0 and leave bootstrapping with an empty array, []?

Tejas_Behra · August 12, 2021, 7:56pm

@ihor still getting the same error -

INFO[0000] Launching EMR cluster with name 'dataflow-runner - cluster name'...
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
ERRO[0030] EMR cluster failed to launch with state TERMINATING
EMR cluster failed to launch with state TERMINATING

Tejas_Behra · October 11, 2021, 1:06pm

Hi @ihor any help on this, I am still getting a validation error

ubuntu@ip-10-0-0-157:~/rr-snowplow/upgrade/modules/dataflow_runner$ ./dataflow-runner up --emr-config cluster_2.json

INFO[0000] Launching EMR cluster with name ‘RDB Shredder’…
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
INFO[0030] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
INFO[0060] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
ERRO[0090] EMR cluster failed to launch with state TERMINATING
EMR cluster failed to launch with state TERMINATING

ian.a · October 14, 2021, 9:30pm

@Tejas_Behra have you tried running with debug log level to see if you get any further details as to what might be failing?

Something like

dataflow-runner [global options] command [command options] [arguments...]


./dataflow-runner --log-level=debug up --emr-config=cluster_2.json

Tejas_Behra · October 15, 2021, 6:45pm

@ian.a Tried with debug log setting but got the same error -
ubuntu@ip-10-0-0-157:~/rr-snowplow/upgrade/modules/dataflow_runner$ ./dataflow-runner --log-level=debug up --emr-config cluster_2.json
INFO[0000] Launching EMR cluster with name ‘RDB Shredder’…
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
INFO[0030] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
INFO[0060] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds…
ERRO[0090] EMR cluster failed to launch with state TERMINATING
EMR cluster failed to launch with state TERMINATING

mike · October 15, 2021, 10:05pm

Can you paste your updated cluster_2.json config here?

Tejas_Behra · October 17, 2021, 7:32pm

Here it is

{
  "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
  "data": {
    "name": "RDB Shredder",
    "logUri": "s3://rr-snowplow-events-sample-app-dev/emr-logs/",
    "region":"us-east-1",
    "credentials": {
      "accessKeyId": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
      "secretAccessKey": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    },
    "roles": {
      "jobflow": "EMR_EC2_DefaultRole",
      "service": "EMR_DefaultRole"
    },
    "ec2": {
      "amiVersion": "6.2.0",
      "keyName": "snowplow_dev.pem",
      "location": {
        "vpc": {
          "subnetId": "subnet-XXXXXXX"
        }
      },
      "instances": {
        "master": {
          "type": "m4.large",
          "ebsConfiguration": {
            "ebsOptimized": true,
            "ebsBlockDeviceConfigs": [

            ]
          }
        },
        "core": {
          "type": "r4.xlarge",
          "count": 1
        },
        "task": {
          "type": "m4.large",
          "count": 0,
          "bid": "0.015"
        }
      }
    },
    "tags": [ ],
    "bootstrapActionConfigs": [
            {
         "name": "Elasticity Bootstrap Action",
         "scriptBootstrapAction": {
                "path": "s3://snowplow-hosted-assets-us-east-1/common/emr/snowplow-ami4-bootstrap-0.2.0.sh",
                "args": [ "1.5" ]
                }
            }

    ],
    "configurations": [
      {
         "classification":"core-site",
         "properties":{
            "Io.file.buffer.size":"65536"
         },
         "configurations":[

         ]
},
      {
         "classification":"yarn-site",
         "properties":{
            "yarn.nodemanager.resource.memory-mb":"57344",
            "yarn.scheduler.maximum-allocation-mb":"57344",
            "yarn.nodemanager.vmem-check-enabled":"false"
         },
         "configurations":[

         ]
      },
      {
         "classification":"spark",
         "properties":{
            "maximizeResourceAllocation":"false"
         },
         "configurations":[

         ]
      },
      {
         "classification":"spark-defaults",
         "properties":{
            "spark.executor.memory":"7G",
            "spark.driver.memory":"7G",
            "spark.driver.cores":"3",
            "spark.yarn.driver.memoryOverhead":"1024",
            "spark.default.parallelism":"24",
            "spark.executor.cores":"1",
            "spark.executor.instances":"6",
            "spark.yarn.executor.memoryOverhead":"1024",
            "spark.dynamicAllocation.enabled":"false"
         },
         "configurations":[

         ]
      }
   ],
    "applications": [ "Hadoop", "Spark" ]
  }
}

mike · October 17, 2021, 11:56pm

This mostly looks correct as far as I can tell.

Is that subnet attached to a VPC with available IP addresses etc and the the two roles you’ve specified exist?

Tejas_Behra · October 18, 2021, 2:15pm

@mike just checked the subnet is attached to VPC and its public. Also, the two roles also exist

Topic		Replies	Views
Dataflow Runner run-transient not working For engineers	6	712	February 17, 2022
Spark missing in Dataflow-runner Enrichment	25	3311	December 10, 2020
Application configuration with dataflow-runner Troubleshooting	3	1299	December 22, 2017
Dataflow-runner and subnetId	1	741	January 10, 2022
RDB shredder failed? For engineers	27	2755	January 5, 2022

Validation error on dataflow runner up

Related Topics