R90 storage loading problems


#1

Just upgraded to R90 this morning and having some issues. The logs don’t tell me exactly what could be wrong. I added the master to our VPC group for redshift and also the Redshift Load Role. And updated configs as well of course (EMR + target configs). Also what --skip flags do i include for this new version to ONLY run storage loader through emr etl runner? i’ve tried a whole bunch of different combos and nothing works.

I, [2017-08-01T19:04:41.784000 #25727]  INFO -- : ERROR: Data loading error Problems with establishing DB connection
[Amazon](600000) Error setting/closing connection: General SSLEngine problem.
Following steps completed: [Discover]

Target config:

{
  "schema": "iglu:com.snowplowanalytics.snowplow.storage/redshift_config/jsonschema/2-0-0",
  "data": {
    "name": "Data Warehouse",
    "host": "***********",
    "database": "dev",
    "port": 5439,
    "sslMode": "REQUIRE",
    "schema": "atomic",
    "username": "snowplow",
    "password": "********",
    "roleArn": "*******",
    "maxError": 1,
    "compRows": 200000,
    "purpose": "ENRICHED_EVENTS"
  }
}

#2

Hello @mjensen,

Sorry to hear about it.

  1. To run only RDB Loader step you need to skip following steps: staging,s3distcp,enrich,shred,elasticsearch,archive_raw,analyze,archive_enriched. Basically, everything except rdb_load.
  2. Logs tell us that RDB Loader cannot establish connection to Redshift due “SSLEngine problem”. Unfortunately, it can mean both general unavailability (due firewall) and real SSL problem (we’ve changed JDBC driver to Redshift native in this release and I encountered many differences).

To diagnose why RDB Loader cannot establish connection I propose first to check if problem is in JDBC driver, by changing sslMode to DISABLE if it is possible. If it doesn’t help then there’s something in AWS security settings (can you confirm btw, roleArn has form of arn:aws:iam::719197435995:role/RedshiftLoadRole and not just RedshiftLoadRole?).


#3

@anton thanks, trying these now. i tried that whole list for --skip as well and keep getting this:

Cannot process Thrift events with --skip s3distcp

and yes roleARN has the whole string from AWS. i just zeroed it out for the post.


#4

@anton disabling SSL in storage config worked thanks.

I, [2017-08-02T15:03:06.457000 #23538] INFO – : RDB Loader logs
D, [2017-08-02T15:03:06.904000 #23538] DEBUG – : Downloading s3://ga-snowplow-production/snowplow-log/rdb-loader/2017-08-02-14-24-52/df0c1606-e549-40b2-a297-aaec08f69c7c to /var/app/current/enrichments/rdbloader20170802-23538-1srqb9i
I, [2017-08-02T15:03:08.933000 #23538] INFO – : Data Warehouse
I, [2017-08-02T15:03:08.933000 #23538] INFO – : RDB Loader successfully completed following steps: [Discover, Load, Analyze]
D, [2017-08-02T15:03:08.933000 #23538] DEBUG – : EMR jobflow j-C1QGLG3LMFDA completed successfully.

keep us updated on when SSL will be supported and i can try again. we’d prefer SSL enabled.


#5

@anton still getting this:

$ /var/app/current/bin/snowplow-emr-etl-runner --config /var/app/current/etc/emr-config.yml --resolver /var/app/current/etc/resolver.conf  --enrichments /var/app/current/enrichments --targets /var/app/current/etc/storage_targets --skip staging,s3distcp,enrich,shred,elasticsearch,archive_raw,analyze,archive_enriched

Cannot process Thrift events with --skip s3distcp
Error running EmrEtlRunner, exiting with return code 1. StorageLoader not run

– update
taking out s3distcp worked though. thanks,

/var/app/current/bin/snowplow-emr-etl-runner --config /var/app/current/etc/emr-config.yml --resolver /var/app/current/etc/resolver.conf  --enrichments /var/app/current/enrichments --targets /var/app/current/etc/storage_targets --skip staging,enrich,shred,elasticsearch,archive_raw,analyze,archive_enriched

#6

Thanks for update @mjensen. Though I’m quite puzzled where this message comes from

Error running EmrEtlRunner, exiting with return code 1. StorageLoader not run

R90 should not have StorageLoader mentioned anywhere.


#7

not sure :slight_smile:
[root@ip-* bin]# ./snowplow-emr-etl-runner --version
snowplow-emr-etl-runner 0.26.0

[root@ip-* bin]# ls -l snowplow-emr-etl-runner
-rwxr-xr-x 1 webapp webapp 36006486 Aug 8 18:12 snowplow-emr-etl-runner


#8

I think the “Error running EmrEtlRunner… StorageLoader not run” message comes from the simple orchestration shell script that used to be recommended for running StorageLoader after checking the return value of EmrEtlRunner.


#9

Hello @mjensen,

Just a heads-up that if you’re still load Redshift via SSL - you may encounter problems after October 23rd as AWS is replacing certificates. To fix this this you’ll need to either use StorageLoader and use Redshift Certificate Authority Bundle as described in above link or use pre-released assets. For latter you’ll need to bump following properties in your config.yml:

  • storage.versions.rdb_loader to 0.14.0-rc2
  • storage.versions.rdb_shredder to 0.13.0-rc2
  • aws.emr.amiVersion to 5.9.0

Final versions should be released soon.


#10

@anton thank you