hi all,
i tried to run EmrEtlrunner using bellow command
i am using EmrEtlrunner version "snowplow_emr_r91_stonehenge_rc9

./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments snowplow/3-enrich/config/enrichments/ --skip staging

when i ran this command many times got same issue even though using --skip staging and --skip,emr in my command

bellow is the error in command line

		 F, [2017-09-26T11:52:41.929000 #10462] FATAL -- :slight_smile: nowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-1RGHAMRBK321 failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
	Snowplow ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2017-09-26 11:41:12 +0000 - ]
	 - 1. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: COMPLETED ~ 00:07:17 [2017-09-26 11:41:14 +0000 - 2017-09-26 11:48:31 +0000]
	 - 2. Elasticity Spark Step: Enrich Raw Events: COMPLETED ~ 00:01:58 [2017-09-26 11:48:33 +0000 - 2017-09-26 11:50:32 +0000]
	 - 3. Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:31 [2017-09-26 11:50:34 +0000 - 2017-09-26 11:51:05 +0000]
	 - 4. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
	 - 5. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]
	 - 6. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: CANCELLED ~ elapsed time n/a [ - ]
	 - 7. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
	 - 8. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
	 - 9. Elasticity Spark Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
	 - 10. Elasticity Custom Jar Step: Empty Raw HDFS: CANCELLED ~ elapsed time n/a [ - ]
	 - 11. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
		uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
		uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
		uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
		org/jruby/RubyKernel.java:979:in `load'
		uri:classloader:/META-INF/main.rb:1:in `<main>'
		org/jruby/RubyKernel.java:961:in `require'
		uri:classloader:/META-INF/main.rb:1:in `(root)'
		uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

log status

	Exception in thread "main" java.lang.RuntimeException: Error running job
	at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:927)
	at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
	 Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-4-251.ec2.internal:8020/tmp/1b3cec1f-c859-4d4b-b320-63b90e51b52c/files
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:317)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
	at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:352)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:901)
	... 10 more


  # Credentials can be hardcoded or set in environment variables
  access_key_id: xxxxxx
  secret_access_key: xxxxxx
  #keypair: Snowplowkeypair
  #key-pair-file: /home/ubuntu/snowplow/4-storage/config/Snowplowkeypair.pem
  region: us-east-1
    region: us-east-1
      assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
      jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
      log: s3://unilogregion1/logs
        in:                  # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below
          - s3://unilogregion1         # e.g. s3://my-old-collector-bucket
        processing: s3://unilogregion1/raw/processing
        archive: s3://unilogregion1/raw/archive   # e.g. s3://my-archive-bucket/raw
        good: s3://unilogregion1/enriched/good        # e.g. s3://my-out-bucket/enriched/good
        bad: s3://unilogregion1/enriched/bad       # e.g. s3://my-out-bucket/enriched/bad
        errors: s3://unilogregion1/enriched/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
        archive: s3://unilogregion1/enriched/archive    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
        good: s3://unilogregion1/shredded/good        # e.g. s3://my-out-bucket/shredded/good
        bad: s3://unilogregion1/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
        errors: s3://unilogregion1/shredded/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
        archive: s3://unilogregion1/shredded/archive     # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
    ami_version: 4.5.0
    region: us-east-1       # Always set this
    jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
    service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
    placement: us-east-1a      # Set this if not running in VPC. Leave blank otherwise
    ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
    ec2_key_name: Snowplowkeypair
    bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
      hbase:              # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
      lingual:              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
    # Adjust your Hadoop cluster below
      job_name: Snowplow ETL # Give your job a name
      master_instance_type: m1.medium
      core_instance_count: 2
      core_instance_type: m1.medium
      core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
        volume_size: 100    # Gigabytes
        volume_type: "gp2"
        volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
        ebs_optimized: false # Optional. Will default to true
      task_instance_count: 0 # Increase to use spot instances
      task_instance_type: m1.medium
      task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
    bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
        yarn.resourcemanager.am.max-attempts: "1"
        maximizeResourceAllocation: "true"
    additional_info:        # Optional JSON string for selecting additional features
  format: thrift # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
    spark_enrich: 1.9.0 # Version of the Spark Enrichment process
  continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
    rdb_loader: 0.12.0
    rdb_shredder: 0.12.0        # Version of the Spark Shredding process
    hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
  tags: {} # Name-value pairs describing this job
    level: DEBUG # You can optionally switch to INFO for production
    #method: get
    #app_id: unilog # e.g. snowplow
    #collector: # e.g. d3rkrsqld9gmqf.cloudfront.net

please help me how to solve this issue.


@shashi Could you avoid using release candidates? They have been produced for testing only and should not be used otherwise.


thanks for the response.

But their are 10 versions in “snowplow_emr_r91_stonehenge” in release candidates
please suggest which one should i select.


@shashi You should always pick the one not labelled rc (rc means release candidate), so that would be snowplow_emr_r91_stonehenge.

Even better would be to use the latest version snowplow_emr_r92_maiden_castle.