EMR shred step failed with exitCode -1


#1

My EMR job failed this morning (was successfully run since last week).

This is what I found in controller log:

2017-03-29T19:48:32.155Z WARN Step failed with exitCode -1 and took 2586 seconds

stderr and stdout logs are empty and syslog has nothing interesting in it as well.

Could you please help to understand the meaning of exitCode -1?


#2

Ok, after I moved all files back into “in” folder EMR job processed them successfully on next scheduled time.
But it’s still good to know what exitcode -1 means. May be there is a way to prevent those issues in the future.


#3

@tyomo4ka I’m curious why that worked… Did you do anything differently on the second run? E.g. bump your instances or upgrade EMR ETL runner between jobs?

I’ve been re-running enrichment from the Shred step but no cigar. I’m currently re-running from the Enrich step to see if that works.


#4

It worked!

Kind of like you said @tyomo4ka , all I had to do was:

  1. Remove the files in enriched from the failed run
  2. Re-run enrichment from step “Enrich”

No changes to the pipeline or anything. Figured I’d share back here in case others have issues like this showing up in stderr:

18/07/16 03:42:17 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 172.31.19.185
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1531708082200
	 final status: FAILED
	 tracking URL: http://ip-172-31-27-154.ap-southeast-2.compute.internal:20888/proxy/application_1531707705580_0002/
	 user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1531707705580_0002 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/07/16 03:42:17 INFO ShutdownHookManager: Shutdown hook called
18/07/16 03:42:17 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-6393393a-b249-4816-82a1-56e842e65821
Command exiting with ret '1'