Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down issues


#1

Today for the first time I started seeing the error below with S3 during the Archive Raw step.

Error: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: A4DBBE2A70442C4D)

After waiting for over two hours to see if it’d recover, I killed the EMR cluster and plan to restart from that step.

Looking at AWS’s forums, I noticed that they ask to implement a backoff approach.

Is Snowplow able or have a plan to support such scenario?

UPDATE 1 It does feel like the jobs got lost - because the S3 bucket has no files left to move.

Error: java.lang.RuntimeException: Reducer task failed to copy 1061 files: s3://xxxxxxx-raw-events-xxxxxxxxxx/processing/2017-10-18-49577239542950406222078834590292547004774673200395059202-49577239542950406222078834703261829144483314152364834818.lzo etc

UPDATE 2 After restarting the emr job with options --skip staging,enrich,shred,elasticsearch, looking to restart from the S3 Raw Archive step, I got the failure below. The raw processing bucket has 0 files as mentioned above.

Exception in thread "main" java.lang.RuntimeException: Error running job at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:927) at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-28-218.us-west-2.compute.internal:8020/tmp/5d7dd4d2-7781-43ec-8cd2-27f70e907af4/files at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:317) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:352) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:901) ... 10 more

UPDATE 3 Now I realized that I can’t run the RedShift and Postgres loader because I can’t start from that step without having any Raw file to archive.


#2

Hey @cmartins - in this case, assuming that the raw events have been successfully archived, you can just start from the next step (rdb_load).


#3

Someone ran into the same issue here.

I replied with a strategy to avoid this sort of thing.