RDB Loader 1.0.0 "Database error: [Amazon][JDBC](10060) Connection has been closed"

Hi all,

We have the following Snowplow setup:
AWS account A with an S3 bucket for the shredded snowplow events and EC2 instance on which we are running the RDB loader docker container
AWS account B with a Redshift cluster

The EC2 instance from account A is talking to Redshift through a VPC endpoint and the Redshift cluster has a role which allows access to the S3 bucket in account A.
This communication works because from the EC2 instance we can establish a JDBC connection to the Redshift cluster and inspect its tables and data. But when the RDB loader tries to execute its COPY command we get the following error:

INFO 2021-05-19 14:54:01.345: COPY atomic.events
RDB Loader unknown error in executeUpdate
java.sql.SQLNonTransientConnectionException: [Amazon]JDBC Connection has been closed.
at com.amazon.exceptions.ExceptionConverter.toSQLException(Unknown Source)
at com.amazon.jdbc.common.SConnection.checkIfOpen(Unknown Source)
at com.amazon.jdbc.common.SConnection.setAutoCommit(Unknown Source)
at com.amazon.redshift.core.jdbc42.S42NotifiedConnection.setAutoCommit(Unknown Source)
at doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$setAutoCommit$1(kleisliinterpreter.scala:800)
at doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$setAutoCommit$1$adapted(kleisliinterpreter.scala:800)
at doobie.free.KleisliInterpreter.$anonfun$primitive$2(kleisliinterpreter.scala:109)
at blockOn$extension @ doobie.free.KleisliInterpreter.$anonfun$primitive$1(kleisliinterpreter.scala:112)
at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:750)
at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
at bracketCase @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:750)
at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
at bracketCase @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:750)
at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
at use @ com.snowplowanalytics.snowplow.rdbloader.Main$.run(Main.scala:36)

Any help will be appreciated :slight_smile:

Hi @Deyan_Deyanov ,

I think that the issue is that your Redshift cluster in account B is not able to read data in your S3 bucket in account A.

RDB loader connects to Redshift to run the COPY statement but then Redshift needs to access the data on s3.

You might want to take a look at this page :

https://docs.aws.amazon.com/redshift/latest/dg/copy-usage_notes-access-permissions.html

Thank you @BenB :slight_smile:

We managed to track down the issue and indeed it was related to the Redshift cluster permissions. But it took us some time to understand what exactly the problem is - it turned out that the Redshift and S3 policies were actually ok but our Redshift user did not have permissions to assume the role for copying data from S3.

We solved that with a GRANT ASSUMEROLE ON “our IAM role” TO “user” FOR ALL following the documentation here: GRANT - Amazon Redshift