Suggested best practices for recovering from EmrEtlRunner failures?

Hey @alex,

Looking back over the logs I found two cases where the schema lookup in Iglu Central seems to have been the problem. I’ll paste the logs where I see the actual error message, but I can get you other logs from the relevant runs if that is helpful (and you can tell me what would be useful).

Here are the contents of logs/<bucket>/steps/<step>/stderr.gz for an EMR cluster that was created at 2016-07-16 13:24 (UTC-4)

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at com.twitter.scalding.Job$.apply(Job.scala:47)
	at com.twitter.scalding.Tool.getJob(Tool.scala:48)
	at com.twitter.scalding.Tool.run(Tool.scala:68)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner$.main(JobRunner.scala:33)
	at com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner.main(JobRunner.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: com.snowplowanalytics.snowplow.enrich.common.FatalEtlError: NonEmptyList(error: NonEmptyList(error: Could not find schema with key iglu:com.snowplowanalytics.snowplow/currency_conversion_config/jsonschema/1-0-0 in any repository, tried:
    level: "error"
    repositories: ["Iglu Client Embedded [embedded]","Iglu Central [HTTP]"]
, error: Unexpected exception fetching iglu:com.snowplowanalytics.snowplow/currency_conversion_config/jsonschema/1-0-0 in HTTP Iglu repository Iglu Central: java.io.IOException: Server returned HTTP response code: 500 for URL: http://iglucentral.com/schemas/com.snowplowanalytics.snowplow/currency_conversion_config/jsonschema/1-0-0
    level: "error"
)
    level: "error"
)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$2.apply(EtlJob.scala:140)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$2.apply(EtlJob.scala:140)
	at scalaz.Validation$class.fold(Validation.scala:64)
	at scalaz.Failure.fold(Validation.scala:330)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob.<init>(EtlJob.scala:139)
	... 16 more

And here are the contents of the same file for an EMR cluster created at 2016-07-13 13:23 (UTC-4)

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at com.twitter.scalding.Job$.apply(Job.scala:47)
	at com.twitter.scalding.Tool.getJob(Tool.scala:48)
	at com.twitter.scalding.Tool.run(Tool.scala:68)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner$.main(JobRunner.scala:33)
	at com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner.main(JobRunner.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: com.snowplowanalytics.snowplow.enrich.common.FatalEtlError: NonEmptyList(error: NonEmptyList(error: Could not find schema with key iglu:com.snowplowanalytics.snowplow/enrichments/jsonschema/1-0-0 in any repository, tried:
    level: "error"
    repositories: ["Iglu Client Embedded [embedded]","Iglu Central [HTTP]"]
, error: Unexpected exception fetching iglu:com.snowplowanalytics.snowplow/enrichments/jsonschema/1-0-0 in HTTP Iglu repository Iglu Central: java.io.IOException: Server returned HTTP response code: 500 for URL: http://iglucentral.com/schemas/com.snowplowanalytics.snowplow/enrichments/jsonschema/1-0-0
    level: "error"
)
    level: "error"
)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$2.apply(EtlJob.scala:140)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$2.apply(EtlJob.scala:140)
	at scalaz.Validation$class.fold(Validation.scala:64)
	at scalaz.Failure.fold(Validation.scala:330)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob.<init>(EtlJob.scala:139)
	... 16 more

Let me know if there is any more information I can provide that would be useful!