RDB loader failed. SQL `NULL` read at column 1

Hi I’m able to setup entire pipeline but last bit RDB loader is failing with

[ioapp-compute-1] INFO Could not load a folder (base s3://snowplow-shred/shredded/good/run=2022-01-13-03-30-00/), trying to ack the SQS command
[ioapp-compute-0] ERROR Loader shutting down
doobie.util.invariant$NonNullableColumnRead: SQL `NULL` read at column 1 (JDBC type Char) but mapping is to a non-Option type; use Option here. Note that JDBC column indexing is 1-based.
	at doobie.util.Get.unsafeGetNonNullable(get.scala:29)
	at doobie.util.Read$.$anonfun$fromGet$1(read.scala:74)
	at doobie.util.Read$.$anonfun$fromGet$1$adapted(read.scala:74)
	at doobie.util.Read.$anonfun$get$1(read.scala:55)
	at doobie.free.KleisliInterpreter.$anonfun$primitive$2(kleisliinterpreter.scala:109)
	at blockOn$extension @ doobie.free.KleisliInterpreter.$anonfun$primitive$1(kleisliinterpreter.scala:112)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
	at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
	at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter.$anonfun$embed$12(kleisliinterpreter.scala:131)
	at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
	at blockOn$extension @ doobie.free.KleisliInterpreter.$anonfun$primitive$1(kleisliinterpreter.scala:112)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
	at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
	at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter.$anonfun$embed$12(kleisliinterpreter.scala:131)
	at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
	at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
	at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter.$anonfun$embed$12(kleisliinterpreter.scala:131)

and shredding_complete.json file looks like below

{"schema":"iglu:com.snowplowanalytics.snowplow.storage.rdbloader/shredding_complete/jsonschema/1-0-1","data":{"base":"s3://snowplow-shred/shredded/good/run=2022-01-13-03-30-00/","types":[{"schemaKey":"iglu:com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0","format":"TSV"},{"schemaKey":"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0","format":"TSV"}],"timestamps":{"jobStarted":"2022-01-13T03:40:33.056Z","jobCompleted":"2022-01-13T03:42:27.638Z","min":"2022-01-11T12:25:18.867Z","max":"2022-01-12T10:21:46.656Z"},"compression":"GZIP","processor":{"artifact":"snowplow-rdb-shredder","version":"2.0.0"},"count":{"good":1078}}}

My folders and files structure as below

s3://snowplow-shred/shredded/good/run=2022-01-13-03-30-00/
|->shredding_complete.json
|->_SUCCESS
|->output=good
    |->vendor=com.snowplowanalytics.snowplow
        |->name=atomic
            |->format=tsv
                |->model=1
                     |->*.gz files
        |->name=duplicate
            |->format=tsv
                |->model=1
                     |->*.gz files
        |->name=web_page
            |->format=tsv
                |->model=1
                     |->*.gz files

Not sure what’s happening, any help appreciated.

Hi @pramod.niralakeri,

It seems you’re very close to be done with loading and your shredding_complete.json as well as folder structure look completely valid to me.

Few questions:

  1. What tables exist in your schema? Do events, com_snowplowanalytics_snowplow_duplicate_1 and com_snowplowanalytics_snowplow_web_page_1 exist?
  2. Can you show few lines above Could not load a folder message?
  3. Do you have manifest or manifest_legacy tables? How do they look?
  4. Did you try this only once or several times (asking because we’ve seen it in production couple of times and always considered it some weird transient issue).

Thanks for reply @anton

  1. I’ve only events table, other two don’t
  2. There no lines above the Could not load a folder, except listening to SQS msg
  3. Don’t have manifest or manifest_legacy
  4. Tried multiple times, and every time failed

bump up thread

@anton/@mike could you please help,

Now I’ve created

events, 
com_snowplowanalytics_snowplow_duplicate_1
com_snowplowanalytics_snowplow_web_page_1

tables, but issue still persist, no luck.

Logs


[ioapp-compute-0] INFO Received new message. Total 1 messages received, 0 loaded, 0 attempts has been made to load current folder
[ioapp-compute-0] INFO New data discovery at run=2022-01-17-15-00-00 with following shredded types:
  * iglu:com.snowplowanalytics.snowplow/atomic/jsonschema/1-*-* TSV
  * iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-*-* TSV
  * iglu:com.snowplowanalytics.snowplow/mobile_context/jsonschema/1-*-* TSV
[ioapp-compute-0] WARN Releasing JDBC connection
[ioapp-compute-0] INFO Could not load a folder (base s3://snowplow-shred/shredded/good/run=2022-01-17-15-00-00/), trying to ack the SQS command
[ioapp-compute-1] ERROR Loader shutting down
doobie.util.invariant$NonNullableColumnRead: SQL `NULL` read at column 1 (JDBC type Char) but mapping is to a non-Option type; use Option here. Note that JDBC column indexing is 1-based.
	at doobie.util.Get.unsafeGetNonNullable(get.scala:29)
	at doobie.util.Read$.$anonfun$fromGet$1(read.scala:74)
	at doobie.util.Read$.$anonfun$fromGet$1$adapted(read.scala:74)
	at doobie.util.Read.$anonfun$get$1(read.scala:55)
	at doobie.free.KleisliInterpreter.$anonfun$primitive$2(kleisliinterpreter.scala:109)
	at blockOn$extension @ doobie.free.KleisliInterpreter.$anonfun$primitive$1(kleisliinterpreter.scala:112)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
	at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
	at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter.$anonfun$embed$12(kleisliinterpreter.scala:131)
	at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
	at blockOn$extension @ doobie.free.KleisliInterpreter.$anonfun$primitive$1(kleisliinterpreter.scala:112)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
	at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
	at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter.$anonfun$embed$12(kleisliinterpreter.scala:131)
	at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
	at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:167)
	at tailRecM @ retry.package$RetryingOnSomeErrorsPartiallyApplied.apply(package.scala:96)
	at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter.$anonfun$embed$12(kleisliinterpreter.scala:131)

RDB version : snowplow/snowplow-rdb-loader:2.0.0

@pramod.niralakeri, there’s something that doesn’t add up in this info.

I’m fairly certain that the origin of the problem is somewhere around manifest table. The problem is that I don’t think it’s possible that no additional logging messages appear about manifest (must be this or this or this - right after “listening SQS queue”) AND no manifest table exist.

I’d advise you to do one of the following things:

  1. Manually create legacy manifest: snowplow/manifest-def.sql at master · snowplow/snowplow · GitHub (bear in mind the schema - it must match yours in case it’s not atomic)
  2. Try out snowplow/snowplow-rdb-loader:2.1.0-rc4 (there’s a tiny change that would at least improve logging)

If both don’t make any change - I’ll provide you a DDL of new manifest table (don’t have it at hand at the moment).

@anton

  1. manifest table is created automatically with columns
base,types,shredding_started,shredding_completed,min_collector_tstamp,max_collector_tstamp,ingestion_tstamp,compression,processor_artifact,processor_version,count_good
  1. using 2.1.0-rc4 results same error, but slight change in log, complete log below
INFO Main: RDB Loader 2.1.0-rc4 has started. Listening snowplow-shredder-loader.fifo
INFO HikariPool-1 - Starting...
INFO HikariPool-1 - Driver does not support get/set network timeout for connections. ([Amazon][JDBC](10220) Driver does not support this optional feature.)
INFO HikariPool-1 - Start completed.
INFO FolderMonitoring: Configuration for monitoring.folders hasn't been provided - monitoring is disabled
INFO DataDiscovery: Received a new message
INFO DataDiscovery: Total 1 messages received, 0 loaded
INFO DataDiscovery: New data discovery at run=2022-01-17-15-30-00 with following shredded types:
  * iglu:com.snowplowanalytics.snowplow/atomic/jsonschema/1-*-* TSV
  * iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-*-* TSV
  * iglu:com.snowplowanalytics.snowplow/mobile_context/jsonschema/1-*-* TSV
ERROR Loader: Loader shutting down
INFO HikariPool-1 - Shutdown initiated...
INFO HikariPool-1 - Shutdown completed.
INFO Shutting down connection pool: curAllocated=1 idleQueues.size=1 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
java.lang.IllegalStateException: Cannot set Cancelling(SQL `NULL` read at column 1 (JDBC type Char) but mapping is to a non-Option type; use Option here. Note that JDBC column indexing is 1-based.) stage while loading is Idle. Current state is State(Idle,2022-01-18T03:05:04.160190Z,0,Map(),0,1)
	at com.snowplowanalytics.snowplow.rdbloader.state.Control.$anonfun$setStage$2(Control.scala:65)
	at fs2.concurrent.SignallingRef$SignallingRefImpl.$anonfun$update$1(Signal.scala:238)
	at fs2.concurrent.SignallingRef$SignallingRefImpl.$anonfun$modify$1(Signal.scala:242)
	at flatten$extension @ fs2.concurrent.SignallingRef$SignallingRefImpl.modify(Signal.scala:249)
	at instantNow @ com.snowplowanalytics.snowplow.rdbloader.state.State$.mk(State.scala:95)
	at flatMap @ com.snowplowanalytics.snowplow.rdbloader.state.Control.setStage(Control.scala:55)
	at uncancelable @ org.http4s.client.ConnectionManager$.pool(ConnectionManager.scala:83)
	at uncancelable @ org.http4s.client.ConnectionManager$.pool(ConnectionManager.scala:83)
	at uncancelable @ org.http4s.client.ConnectionManager$.pool(ConnectionManager.scala:83)
	at traverse @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
	at traverse @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
	at traverse @ fs2.concurrent.SignallingRef$SignallingRefImpl.$anonfun$modify$1(Signal.scala:245)
	at uncancelable @ org.http4s.client.ConnectionManager$.pool(ConnectionManager.scala:83)
	at flatten$extension @ fs2.concurrent.SignallingRef$SignallingRefImpl.modify(Signal.scala:249)
	at instantNow @ com.snowplowanalytics.snowplow.rdbloader.state.State$.mk(State.scala:95)
	at flatMap @ com.snowplowanalytics.snowplow.rdbloader.state.Control.$anonfun$makeBusy$1(Control.scala:95)

Do you happen to have query logs turn on for your database? The queries that are getting executed against the database might show something. I haven’t seen this error before but it feels like it’s trying to update a manifest row that either doesn’t exist or it can’t fetch it.