Strange Random Error on EmtEtlRunner

kjain · September 21, 2017, 1:16pm

OK…checking the logs of the FIRST time this happened, I see more detailed logs of the error with Excon:

Excon::Error::ServiceUnavailable (Expected(200) <=> Actual(503 Service Unavailable)
excon.error.response
 :body          => "<Error><Code>SlowDown</Code><Message>Please reduce your request rate.</Message><RequestId>3F7E4F0929844761</RequestId><HostId>r4/0CYAG09ZaxxiSyXZJhAAEWGygJzwMGY30TPFj+pH4+LeIbkfXZKDTtsfxwbcpbS4PHu7DaP8=</HostId></Error>"
 :cookies       => [
]
:headers       => {
"Connection"       => "close"
"Content-Type"     => "application/xml"
"Date"             => "Thu, 14 Sep 2017 19:05:43 GMT"
"Server"           => "AmazonS3"
"x-amz-id-2"       => "r4/0CYAG09ZaxxiSyXZJhAAEWGygJzwMGY30TPFj+pH4+LeIbkfXZKDTtsfxwbcpbS4PHu7DaP8="
"x-amz-request-id" => "3F7E4F0929844761"
}
:host          => "in-stats-prod-archive.s3.amazonaws.com"
:local_address => "<internal-ip-of-emr-runner>"
:local_port    => 44096
:path          => "/raw/2017-09-14-49573367394269636484214968494006254435221012695805329410-49573367394269636484214968517991342696375263768313069570.lzo.index"
:port          => 443
:reason_phrase => "Slow Down"
:remote_ip     => "52.216.230.139"
:status        => 503
:status_line   => "HTTP/1.1 503 Slow Down\r\n"
):
uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/expects.rb:7:in `response_call'
uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/response_parser.rb:9:in `response_call'
uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:388:in `response'
uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:252:in `request'
uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/sax_parser_connection.rb:35:in `request'
uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/connection.rb:7:in `request'
uri:classloader:/gems/fog-1.25.0/lib/fog/aws/storage.rb:521:in `_request'
uri:classloader:/gems/fog-1.25.0/lib/fog/aws/storage.rb:516:in `request'
uri:classloader:/gems/fog-1.25.0/lib/fog/aws/requests/storage/copy_object.rb:32:in `copy_object'
uri:classloader:/gems/fog-1.25.0/lib/fog/aws/models/storage/file.rb:92:in `copy'
uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:622:in `block in retry_x'
org/jruby/ext/timeout/Timeout.java:117:in `timeout'
uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:621:in `retry_x'
uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:548:in `block in process_files'
org/jruby/RubyKernel.java:1295:in `loop'
uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:412:in `block in process_files'

Problem destroying processing/2017-09-14-49573367394269636484214968352265746714503765354478567426-49573367394269636484214968375920798226903222662291521538.lzo.index. Retrying.
Problem destroying processing/2017-09-14-49573367394269636484214968518667132229539841820571205634-49573367394269636484214968543292951175089846286953742338.lzo. Retrying.
Error running EmrEtlRunner, exiting with return code 1. StorageLoader not run

So it does seem related to the aws setup in some way? Hope this helps understand the issue…

The EMRETLRunner version I am currently using is: r83 (bald eagle)
http://dl.bintray.com/snowplow/snowplow-generic/snowplow_emr_r83_bald_eagle.zip

I had tried to update the version earlier but that did result in breaking or conflicts hence I kept this version. The platform has been stable for the past 5-6 months until this happened.

UPDATE: I found that this error ha been logged in the the forums earlier as well. That error stemmed from the buckets and the EMR runner being in different regions; but I can confirm that all my buckets, rdb and runners are in the same region. Again, it seems to happen mostly when moving large amounts of data in step 10 of the pipeline (moving from processing to raw:archive). Assuming I can temporarily resume manually, what would be the command I would need to run? As per my above message, I tried it manually but no data got loaded into the rdb!

UPDATE 2: I found that many users of S3 have been seeing this message during large transfers, especially on that day (9/14). Here’s one forum describing it. Maybe it was a temporarily glitch? If so, how can I resume from where the runner left off?

Thanks again!

Topic		Replies	Views
EMR Job - Failing with SSL handshake? AWS batch pipeline (Legacy)	10	2708	December 5, 2019
EmrEtlRunner R92 hangs at initialization Enrichment	3	1745	October 5, 2017
EMRETLrunner error: Excon::Error::MovedPermanently (Expected(200) <=> Actual(301 Moved Permanently) For engineers	5	1205	May 26, 2017
Getting 500 Internal Server Error while running EmrEtlRunner AWS batch pipeline (Legacy)	4	1719	April 21, 2016
Sluice fails on staging/archiving with OpenSSL error "Unsupported record version Unknown-0.0" Enrichment	0	1453	May 25, 2017

Strange Random Error on EmtEtlRunner

Related Topics