RDB Loader fails loading data to Redshift when SSL is enabled


#1

Hi

We’re testing the upgrade from R80 to R93.

Our env resides in AWS VPC and the EmrEtlRunner is used as the collector; the redshift cluster has SSL enabled in its parameter group (require_ssl=true). The EMR instances & the redshift cluster reside within the same VPC subnet.

The EMR job fails when trying to load data to the redshift, from what I understand from the log (attached) - since it cannot establish connection to it.

I’ve done the following setup as part of the upgrade process:

  1. to the redshift security group I’ve added inbound rule that allows connectiuon from the SG of the master EMR node to its port.
  2. created IAM role that has RO access to S3 and assigned it to the redshift cluster.

Trying to troubleshoot the problem I did the following:

  1. placed the events back in the “in” bucket & started the EMR job all over again, from the “staging” phase. While it was performing the initial steps, I’ve logged in to the master instance. From there I’ve issued psql command with same params as in the redshift.json file:
	psql -h <redshift_cluster>.<my_region>.redshift.amazonaws.com -U <user> -d <db> -p <port>
-> SSL connection was established and I was able to query it:
<db>=# select * from atomic.manifest;
 etl_tstamp | commit_tstamp | event_count | shredded_cardinality
------------+---------------+-------------+----------------------
(0 rows)

I’ve then issued tcpdump of the redshift port (sudo tcpdump dst port -w tcpdump.log) but nothing was logged although it took 1min to the rdb_load step to fail. I coul;dn’t further debug it as the server was terminated afterwards.

  1. tried downgrading the rdb_loader version from 0.13.0 to 0.12.0 & resumed - same error.

  2. disabled the “require_ssl” setting in the redshift parameter group.
    resumed the EMR job from rdb_load step (after setting the ssl mode to DISABLED in the redshift.json file) - this time it succeeded:

I, [2017-10-10T20:32:24.615000 #14639]  INFO -- : RDB Loader successfully completed following steps: [Discover, Load, Analyze]
D, [2017-10-10T20:32:24.616000 #14639] DEBUG -- : EMR jobflow j-XXXXXX completed successfully.
I, [2017-10-10T20:32:24.617000 #14639]  INFO -- : Completed successfully

I’ve attached the redshift.json & the global config.yml.

Any idea what the problem might be & how to solve it? further debug steps?

BTW - I guess not related but worth mentioning I’m using a test redshift db that was launched from a snapshot of the production cluster, and uses identical redshift configuration (security group, subnet, param group etc).

Thanks a lot for your help!


#2

Hello @morans,

Yes, RDB Loader has known problems with SSL connections. Right now we’re testing upcoming 0.14.0 release, which aims various security issues, including driver update, SSH tunnels and general security hardening.

We’ll let you know when it’s released. Right now, I believe simplest option you have is either to temporary disable SSL requirement or downgrade to StorageLoader and wait some time (hopefully less than week) for official 0.14.0 release.

UPD from your EmrEtlRunner traceback I see that your connection error is [Amazon](600000) Error setting/closing connection: General SSLEngine problem., which is fixed in 0.14.0 by bumping Redshift JDBC driver.


#4

Thanks a kot Anton for ypur prompt response.
the reason we’re upgrading now is that AWS are replacing the redshift ssl certificate and they requested us to “replace your existing Certificate Authority Bundle by October 23rd, 2017 to avoid service interruption”.
almost certain that disabling SSL is not an option so I was wondering if you know if upgrading to the latest storageLoader will be good enough.

Thanks again,
Moran


#5

@morans yes, we’re aware of that change. And 0.14 addresses this issue in first place.

Good news for you is that 0.14.0 will be available before October 23rd for sure, so you can wait until it is available and switch to it.


#6

Perfect, thank you!!


#7

Hi @morans,

Just a little update on this. RDB Loader 0.14.0 will be published soon along with R95, but if time is pushing you can try to upgrade to RDB Loader 0.14.0-rc2. It should work with R90+, without any additional changes, but if you encounter problems you’ll also need to updated rdb_shredder to 0.13.0-rc2 and amiVersion to 5.9.0. Sorry for short notice.


#8

Thanks Anoton!

good news: replacing only the rdb loader still failed at loading the data to redshift.
but changing the ami version & rdb_shredder did the trick.

In case needed, here’s the exception I recieved after replacing only the rdb_load:

Exception in thread “main” java.lang.IllegalAccessError: tried to access class com.amazonaws.services.s3.AmazonS3ClientConfigurationFactory from class com.amazonaws.services.s3.AmazonS3Builder
at com.amazonaws.services.s3.AmazonS3Builder.(AmazonS3Builder.java:30)
at com.snowplowanalytics.snowplow.rdbloader.interpreters.implementations.S3Interpreter$.getClient(S3Interpreter.scala:49)
at com.snowplowanalytics.snowplow.rdbloader.interpreters.Interpreter$.initialize(Interpreter.scala:37)
at com.snowplowanalytics.snowplow.rdbloader.Main$.run(Main.scala:54)
at com.snowplowanalytics.snowplow.rdbloader.Main$.main(Main.scala:35)
at com.snowplowanalytics.snowplow.rdbloader.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)