Sluice fails on staging/archiving with OpenSSL error "Unsupported record version Unknown-0.0"


#1

Hi guys,

These forums are always really helpful when I’m trying to debug issues with our pipeline, so when I got an issue today I couldn’t find a solution for on here I thought I’d share what fixed it for me.

Problem in a nutshell
When running EMR ETL runner, staging and archiving throws the following exception frequently:

Unsupported record version Unknown-0.0 (OpenSSL::SSL::SSLError)

Full stack trace looks like this:

Problem copying E1PCCDMHBQKO7J.2017-05-24-15.53913471.gz. Retrying.
Problem copying E1PCCDMHBQKO7J.2017-05-24-14.cced2637.gz. Retrying.
Problem copying E1PCCDMHBQKO7J.2017-05-24-15.40d7b262.gz. Retrying.
Problem copying E1PCCDMHBQKO7J.2017-05-24-15.f08b7e63.gz. Retrying.
Problem copying E1PCCDMHBQKO7J.2017-05-24-15.c3244a9a.gz. Retrying.
Problem copying E1PCCDMHBQKO7J.2017-05-24-16.4d18a5d5.gz. Retrying.
      x snowplow-in/E1PCCDMHBQKO7J.2017-05-24-16.84388665.gz
(t8)    MOVE snowplow-in/E1PCCDMHBQKO7J.2017-05-24-16.efe42df3.gz -> snowplow-process/E1PCCDMHBQKO7J.2017-05-24-16.efe42df3.gz      x snowplow-in/E1PCCDMHBQKO7J.2017-05-24-16.ac5c9fd7.gz
(t6)    MOVE snowplow-in/E1PCCDMHBQKO7J.2017-05-24-17.296f24dd.gz -> snowplow-process/E1PCCDMHBQKO7J.2017-05-24-17.296f24dd.gz

      x snowplow-in/E1PCCDMHBQKO7J.2017-05-24-16.b4d05bb3.gz
(t1)    MOVE snowplow-in/E1PCCDMHBQKO7J.2017-05-24-17.6ef6d439.gz -> snowplow-process/E1PCCDMHBQKO7J.2017-05-24-17.6ef6d439.gz
F, [2017-05-25T11:43:01.545000 #30962] FATAL -- : 

Excon::Error::Socket (Unsupported record version Unknown-0.0 (OpenSSL::SSL::SSLError)):
    org/jruby/ext/openssl/SSLSocket.java:222:in `connect_nonblock'
    uri:classloader:/gems/excon-0.52.0/lib/excon/ssl_socket.rb:121:in `initialize'
    uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:403:in `socket'
    uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:100:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/mock.rb:48:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/instrumentor.rb:26:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:16:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:16:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:16:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:249:in `request'
    uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/sax_parser_connection.rb:35:in `request'
    uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/connection.rb:7:in `request'
    uri:classloader:/gems/fog-1.25.0/lib/fog/aws/storage.rb:521:in `_request'
    uri:classloader:/gems/fog-1.25.0/lib/fog/aws/storage.rb:516:in `request'
    uri:classloader:/gems/fog-1.25.0/lib/fog/aws/requests/storage/copy_object.rb:32:in `copy_object'
    uri:classloader:/gems/fog-1.25.0/lib/fog/aws/models/storage/file.rb:92:in `copy'
    uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:622:in `block in retry_x'
    org/jruby/ext/timeout/Timeout.java:117:in `timeout'
    uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:621:in `retry_x'
    uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:548:in `block in process_files'
    org/jruby/RubyKernel.java:1295:in `loop'
    uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:412:in `block in process_files'
Task 'CF Enrich staging': failed after 5m, 32s. Reason: the task exited with a value not specified in continue_job - 1 (task expects one of the following return codes to continue [0])

It seemed to be failing very consistently after I updated Ubuntu 16.04 yesterday with the latest packages.

After it failed to re-run the steps when I manually entered the command outside of our Factotum runner, I tried manually copying the files across and that worked just fine. No matter how many times I tried running the staging/archive steps, it would just keep throwing these errors.

After hours of research and trial/error, I just restarted the host and now everything is working swimmingly again…

Unfortunately it was that easy.


EmrEtlRunner R92 hangs at initialization