EmrEtlRunner::EmrExecutionError while storing the events in redshift database

sandesh · October 13, 2017, 1:10pm

i am trying to store the events(present in s3 bucket) to redshift database.
Below is my architecture

JavaScript Tracker --> Scala Stream Collector --> Stream enrich --> kinesis S3 --> S3 -> EmrEtlRunner (shredding) -> Redshit

Till EmrEtlRunner(shredding) i have completed successfully with my full efforts.

Now in the last step i,e, storing events in database i,e Redshift

Below is the configuration details of Storage(redshift)

Config.yml file is below:

aws:
  # Credentials can be hardcoded or set in environment variables
  access_key_id: XXXXX
  secret_access_key: XXXXX
  #keypair: Snowplowkeypair
  #key-pair-file: /home/ubuntu/snowplow/4-storage/config/Snowplowkeypair.pem
  region: us-east-1
  s3:
	region: us-east-1
	buckets:
	  assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
	  jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
	  log: s3://snowplowdataevents2/logs
	  raw:
		in:                  # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below
		  - s3://snowplowdataevents2/      # e.g. s3://my-old-collector-bucket
		processing: s3://snowplowdataevents2/raw/processing
		archive: s3://snowplowdataevents2/raw/archive   # e.g. s3://my-archive-bucket/raw
	  enriched:
		good: s3://snowplowdataevents2/enriched/good        # e.g. s3://my-out-bucket/enriched/good
		bad: s3://snowplowdataevents2/enriched/bad       # e.g. s3://my-out-bucket/enriched/bad
		errors: s3://snowplowdataevents2/enriched/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
		archive: s3://snowplowdataevents2/enriched/archive    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
	  shredded:
		good: s3://snowplowdataevents2/shredded/good        # e.g. s3://my-out-bucket/shredded/good
		bad: s3://snowplowdataevents2/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
		errors: s3://snowplowdataevents2/shredded/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
		archive: s3://snowplowdataevents2/shredded/archive     # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
	ami_version: 5.5.0
	region: us-east-1       # Always set this
	jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
	service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
	placement: us-east-1a      # Set this if not running in VPC. Leave blank otherwise
	ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
	ec2_key_name: Snowplowkeypair
	bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
	software:
	  hbase:              # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
	  lingual:              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
	# Adjust your Hadoop cluster below
	jobflow:
	  job_name: Snowplow ETL # Give your job a name
	  master_instance_type: m2.4xlarge
	  core_instance_count: 2
	  core_instance_type: m2.4xlarge
	  core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
		volume_size: 100    # Gigabytes
		volume_type: "gp2"
		volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
		ebs_optimized: false # Optional. Will default to true
	  task_instance_count: 0 # Increase to use spot instances
	  task_instance_type: m2.4xlarge
	  task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
	bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
	configuration:
	  yarn-site:
		yarn.resourcemanager.am.max-attempts: "1"
	  spark:
		maximizeResourceAllocation: "true"
	additional_info:        # Optional JSON string for selecting additional features
collectors:
  format: thrift # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
enrich:
  versions:
	spark_enrich: 1.9.0 # Version of the Spark Enrichment process
  continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
  versions:
	rdb_loader: 0.12.0
	rdb_shredder: 0.12.0        # Version of the Spark Shredding process
	hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
monitoring:
  tags: {} # Name-value pairs describing this job
  logging:
	level: DEBUG # You can optionally switch to INFO for production
  #snowplow:
	#method: get
	#app_id: unilog # e.g. snowplow
	#collector: 172.31.38.39:8082 # e.g. d3rkrsqld9gmqf.cloudfront.net

iglu_resolver.json file is below:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
	"cacheSize": 500,
	"repositories": [
	  {
		"name": "Iglu Central",
		"priority": 0,
		"vendorPrefixes": [ "com.snowplowanalytics" ],
		"connection": {
		  "http": {
			"uri": "http://iglucentral.com"
		  }
		}
	  }
	]
  }
}

below is the redshift.json file presents in config/targets/

{
	"schema": "iglu:com.snowplowanalytics.snowplow.storage/redshift_config/jsonschema/1-0-0",
	"data": {
		"name": "AWS Redshift enriched events storage",
		"host": "xxxx",
		"database": "unilog",
		"port": 5439,
		"sslMode": "DISABLE",
		"username": "unilog",
		"password": "xxxx",
		"schema": "atomic",
		"maxError": 1,
		"compRows": 20000,
		"purpose": "ENRICHED_EVENTS"
	}
}

The error i am getting in ubuntu CLI is:

ubuntu@ip-172-31-38-39:~$ ./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/4-storage/config/iglu_resolver.json --targets snowplow/4-storage/config/targets/ --skip analyze
D, [2017-10-13T12:37:01.241000 #6617] DEBUG -- : Initializing EMR jobflow
D, [2017-10-13T12:37:27.536000 #6617] DEBUG -- : EMR jobflow j-VN38KCZI51FD started, waiting for jobflow to complete...
I, [2017-10-13T12:49:30.949000 #6617]  INFO -- : No RDB Loader logs
F, [2017-10-13T12:49:31.299000 #6617] FATAL -- :

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-VN38KCZI51FD failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2017-10-13 12:42:48 +0000 - ]
 - 1. Elasticity S3DistCp Step: Raw s3://snowplowdataevents2/ -> Raw Staging S3: COMPLETED ~ 00:02:38 [2017-10-13 12:42:49 +0000 - 2017-10-13 12:45:28 +0000]
 - 2. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: COMPLETED ~ 00:02:12 [2017-10-13 12:45:30 +0000 - 2017-10-13 12:47:42 +0000]
 - 3. Elasticity Spark Step: Enrich Raw Events: COMPLETED ~ 00:01:02 [2017-10-13 12:47:44 +0000 - 2017-10-13 12:48:46 +0000]
 - 4. Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:06 [2017-10-13 12:48:48 +0000 - 2017-10-13 12:48:54 +0000]
 - 5. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 6. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 7. Elasticity Custom Jar Step: Load AWS Redshift enriched events storage Storage Target: CANCELLED ~ elapsed time n/a [ - ]
 - 8. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 9. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 10. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 11. Elasticity Spark Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
 - 12. Elasticity Custom Jar Step: Empty Raw HDFS: CANCELLED ~ elapsed time n/a [ - ]
 - 13. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'
	org/jruby/RubyKernel.java:961:in `require'
	uri:classloader:/META-INF/main.rb:1:in `(root)'
	uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

In the S3 buckets i checked in logs section for the particular step
Below is the error (stderr.gz file)

Exception in thread "main" java.lang.RuntimeException: Error running job
	at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:927)
	at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-11-139.ec2.internal:8020/tmp/286fd0b7-6f45-4d16-bc13-fb69f5a294f9/files
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:317)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
	at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:352)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:901)
	... 10 more

I am in the final stage please help where i am missing.
I am failing in the 4th stage of the process(After 6th sec 4th stage is failing)

ihor · October 14, 2017, 12:25am

@sandesh,

Your workflow looks alright and seems to follow the one depicted in
How to setup a Lambda architecture for Snowplow. Though the name “Stream enrich” sounds odd as the data flown at that point is still “raw” (not enriched). Also, note that EmrEtlRunner will be engaged in enriching your data (it is not just shredding). Thus, the correct workflow would be

JS Tracker -> Scala Stream Collector -> Raw Stream -> Kinesis S3 -> S3 -> EmrEtlRunner -> Redshift

Your error message suggests the files have failed to be moved from HDFS (EMR cluster internal storage) to S3 bucket (post enrichment).

Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:06 [2017-10-13 12:48:48 +0000 - 2017-10-13 12:48:54 +0000]

The logs indicate

Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-11-139.ec2.internal:8020/tmp/286fd0b7-6f45-4d16-bc13-fb69f5a294f9/files

It could be just a “glitch” (EMR cluster terminated prematurely) or you have no “good” enriched files produced (say, all ended up in the “bad” bucket).

You might want to resume the pipeline with option --skip staging in case it was a temporary failure. Do ensure the “good” bucket is empty before rerunning. The resume steps (depending on the failure point) could be found here: https://github.com/snowplow/snowplow/wiki/Batch-pipeline-steps

I wouldn’t use the same bucket for your “raw” in events and the files produced during processing/enrichment/shredding.

Additionally, I can see you are using m2.4xlarge instances. Those are old generation types and do not require/support EBS storage. You could use either 1 x c4.8xlarge or 1 x m4.10xlarge instead.

sandesh · October 16, 2017, 3:38pm

Hey @ihor thanks for the best suggestion
I made the good bucket empty and re run the EmrEtl runner.
If the run the process without --skip staging all the steps of EmrEtlrunner will complete but in the enriched/good/ the run_folder will of 0KB and in shredded/good/ the run_folder is creating successfully
If i tried with --skip staging i am getting the below error in 3rd step of the process.
Below is command used for runninng EMRETLrunner.

./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments snowplow/3-enrich/config/enrichments/ --skip staging

I am getting below error in the ubuntu CLI :
D, [2017-10-16T15:07:45.058000 #2642] DEBUG – : Initializing EMR jobflow
D, [2017-10-16T15:07:48.859000 #2642] DEBUG – : EMR jobflow j-3I1KJNJKU8JV2 started, waiting for jobflow to complete…
F, [2017-10-16T15:17:54.079000 #2642] FATAL – :

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-3I1KJNJKU8JV2 failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2017-10-16 15:12:59 +0000 - ]
 - 1. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: COMPLETED ~ 00:02:38 [2017-10-16 15:13:01 +0000 - 2017-10-16 15:15:39 +0000]
 - 2. Elasticity Spark Step: Enrich Raw Events: COMPLETED ~ 00:01:04 [2017-10-16 15:15:41 +0000 - 2017-10-16 15:16:45 +0000]
 - 3. Elasticity S3DistCp Step: Enriched HDFS -> S3: FAILED ~ 00:00:06 [2017-10-16 15:16:47 +0000 - 2017-10-16 15:16:53 +0000]
 - 4. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 5. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 6. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 7. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 8. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 9. Elasticity Spark Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
 - 10. Elasticity Custom Jar Step: Empty Raw HDFS: CANCELLED ~ elapsed time n/a [ - ]
 - 11. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'
	org/jruby/RubyKernel.java:961:in `require'
	uri:classloader:/META-INF/main.rb:1:in `(root)'
	uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

In the logs when i checekd the stderr.gz file for the particular step below is the error:

17/10/16 15:15:44 INFO RMProxy: Connecting to ResourceManager at ip-172-31-7-95.ec2.internal/172.31.7.95:8032
17/10/16 15:15:45 INFO Client: Requesting a new application from cluster with 2 NodeManagers
17/10/16 15:15:45 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (61440 MB per container)
17/10/16 15:15:45 INFO Client: Will allocate AM container, with 61440 MB memory including 5585 MB overhead
17/10/16 15:15:45 INFO Client: Setting up container launch context for our AM
17/10/16 15:15:45 INFO Client: Setting up the launch environment for our AM container
17/10/16 15:15:45 INFO Client: Preparing resources for our AM container
17/10/16 15:15:46 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/10/16 15:15:49 INFO Client: Uploading resource file:/mnt/tmp/spark-9dc8c996-c571-417e-9e58-aba2fb8b306a/__spark_libs__1073128864940540338.zip -> hdfs://ip-172-31-7-95.ec2.internal:8020/user/hadoop/.sparkStaging/application_1508166661063_0002/__spark_libs__1073128864940540338.zip
17/10/16 15:15:54 INFO Client: Uploading resource s3://snowplow-hosted-assets-us-east-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.9.0.jar -> hdfs://ip-172-31-7-95.ec2.internal:8020/user/hadoop/.sparkStaging/application_1508166661063_0002/snowplow-spark-enrich-1.9.0.jar
17/10/16 15:15:54 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets-us-east-1/3-enrich/spark-enrich/snowplow-spark-enrich-1.9.0.jar' for reading
17/10/16 15:15:56 INFO Client: Uploading resource file:/mnt/tmp/spark-9dc8c996-c571-417e-9e58-aba2fb8b306a/__spark_conf__5417205837143737464.zip -> hdfs://ip-172-31-7-95.ec2.internal:8020/user/hadoop/.sparkStaging/application_1508166661063_0002/__spark_conf__.zip
17/10/16 15:15:56 INFO SecurityManager: Changing view acls to: hadoop
17/10/16 15:15:56 INFO SecurityManager: Changing modify acls to: hadoop
17/10/16 15:15:56 INFO SecurityManager: Changing view acls groups to: 
17/10/16 15:15:56 INFO SecurityManager: Changing modify acls groups to: 
17/10/16 15:15:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
17/10/16 15:15:56 INFO Client: Submitting application application_1508166661063_0002 to ResourceManager
17/10/16 15:15:57 INFO YarnClientImpl: Submitted application application_1508166661063_0002
17/10/16 15:15:58 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:15:58 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1508166956994
	 final status: UNDEFINED
	 tracking URL: http://ip-172-31-7-95.ec2.internal:20888/proxy/application_1508166661063_0002/
	 user: hadoop
17/10/16 15:15:59 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:00 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:01 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:02 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:03 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:04 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:05 INFO Client: Application report for application_1508166661063_0002 (state: ACCEPTED)
17/10/16 15:16:06 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:06 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 172.31.12.106
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1508166956994
	 final status: UNDEFINED
	 tracking URL: http://ip-172-31-7-95.ec2.internal:20888/proxy/application_1508166661063_0002/
	 user: hadoop
17/10/16 15:16:07 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:08 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:09 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:10 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:11 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:12 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:13 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:14 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:15 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:16 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:17 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:18 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:19 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:20 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:21 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:22 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:23 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:24 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:25 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:26 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:27 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:28 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:29 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:30 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:31 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:32 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:33 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:34 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:35 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:36 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:37 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:38 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:39 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:40 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:41 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:42 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:43 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:44 INFO Client: Application report for application_1508166661063_0002 (state: RUNNING)
17/10/16 15:16:45 INFO Client: Application report for application_1508166661063_0002 (state: FINISHED)
17/10/16 15:16:45 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 172.31.12.106
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1508166956994
	 final status: SUCCEEDED
	 tracking URL: http://ip-172-31-7-95.ec2.internal:20888/proxy/application_1508166661063_0002/
	 user: hadoop
17/10/16 15:16:45 INFO ShutdownHookManager: Shutdown hook called
17/10/16 15:16:45 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-9dc8c996-c571-417e-9e58-aba2fb8b306a
Command exiting with ret '0'

Please help me…

Topic		Replies	Views
EmrEtlRunner sink Shredded data into S3 bucket For engineers	0	599	November 11, 2019
Loading data from s3 to Redshift after EmrEtlRunner Troubleshooting	7	3255	November 19, 2018
IgluError (JSON instance is not self-describing (schema property is absent) AWS batch pipeline (Legacy)	6	2060	October 5, 2017
Should I run rdb_load only? For engineers	7	1006	February 11, 2020
EmrEtlRunner not loading data into RedShift For engineers	22	1888	November 11, 2019

EmrEtlRunner::EmrExecutionError while storing the events in redshift database

Related Topics