Enriched HDFS -> S3 step intermittent failure


#1

There were a few times where enriched events were not copied to S3, due to this error:

Exception in thread "main" java.io.IOException: Error opening job jar: /usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar
at org.apache.hadoop.util.RunJar.run(RunJar.java:160)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.util.zip.ZipException: zip file is empty
at java.util.zip.ZipFile.open(Native Method)

Re-running the etl-runner without staging fixes the issue. I found troubleshooting tips here:


I guess it’s difficult to settle such issues easily, but I’m sure most users are using some task scheduler like Jenkins for their snowplow pipeline, so it’s not ideal to be re-running manually as the pipeline is not self-healing. Anyone else had this? Any ideas?

Here’s my config:

  emr:
    ami_version: 4.5.0
    region: eu-central-1
    jobflow_role: EMR_EC2_DefaultRole
    service_role: EMR_DefaultRole
    placement:
    ec2_subnet_id: subnet-[...]
    ec2_key_name: my_key
    bootstrap: []
    software:
      hbase:
      lingual:

    jobflow:
      master_instance_type: m4.large
      core_instance_count: 3
      core_instance_type: c3.4xlarge
      task_instance_count: 0
      task_instance_type: c4.large
      task_instance_bid:
    bootstrap_failure_tries: 3
    additional_info:
collectors:
  format: clj-tomcat
enrich:
  job_name: snowplow ETL
  versions:
    hadoop_enrich: 1.8.0
    hadoop_shred: 0.10.0
    hadoop_elasticsearch: 0.1.0
  continue_on_unexpected_error: false
  output_compression: GZIP