We are running R97 Knossos – haven’t upgraded in over a year because we never had a problem.
However last week, on a couple of our nightly ETL jobs from a Cloudfront collector, we had a failure at the Shred step. Rerunning with ‘-f shred’ the job completed OK.
But then last night, after the same error, we have had no success with 3 recovery attempts.
Maybe I’m not looking at the right log, but stderr for the failed step is not super informative:
19/03/19 11:07:33 INFO Client: client token: N/A diagnostics: User class threw exception: org.apache.spark.SparkException: Job aborted. ApplicationMaster host: 10.0.0.96 ApplicationMaster RPC port: 0 queue: default start time: 1552992503630 final status: FAILED tracking URL: http://ip-10-0-0-82.ec2.internal:20888/proxy/application_1552992078044_0002/ user: hadoop Exception in thread "main" org.apache.spark.SparkException: Application application_1552992078044_0002 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 19/03/19 11:07:33 INFO ShutdownHookManager: Shutdown hook called 19/03/19 11:07:33 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-570e85f9-ed28-48c3-9ba6-e116cb96b606 Command exiting with ret '1'
At this point I would appreciate any advice at all. Thanks in advance!