EMR ETL Runner Fails but EMR Succeeds

Hello,

Today my EMR ETL Runner Failed with the following logs

F, [2018-10-22T22:41:26.686000 #1] FATAL – :

RestClient::RequestTimeout (Request Timeout):

uri:classloader:/gems/rest-client-1.8.0/lib/restclient/request.rb:427:in `transmit’

uri:classloader:/gems/rest-client-1.8.0/lib/restclient/request.rb:176:in `execute’

uri:classloader:/gems/rest-client-1.8.0/lib/restclient/request.rb:41:in `execute’

uri:classloader:/gems/rest-client-1.8.0/lib/restclient.rb:69:in `post’

uri:classloader:/gems/elasticity-6.0.12/lib/elasticity/aws_session.rb:29:in `submit’

uri:classloader:/gems/elasticity-6.0.12/lib/elasticity/emr.rb:206:in `list_steps’

uri:classloader:/gems/elasticity-6.0.12/lib/elasticity/job_flow.rb:194:in `cluster_step_status’

uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:818:in `wait_for’

uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to’

uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with’

uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method’

uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:556:in `run’

uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to’

uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with’

uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method’

uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run’

uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to’

uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with’

uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method’

uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>’

org/jruby/RubyKernel.java:979:in `load’

uri:classloader:/META-INF/main.rb:1:in `<main>’

org/jruby/RubyKernel.java:961:in `require’

uri:classloader:/META-INF/main.rb:1:in `(root)’

uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>’

Yet my EMR job for that run succeeded and I have validated that the data for that run exists in my RedShift.

How do I go about debugging this and preventing this error in the future?

@frankcash, this is a networking issue. EmrEtlRunner lost connectivity to EMR cluster. Thus monitoring of the progress was aborted while the job on EMR cluster continued to run and eventually completed. Not much you can do here. When this does happen, ensure the EMR job did complete and remove the job lock afterwards if used.