we have recently started to update our pipeline (enricher, iglu, shredder, rdb loader) from past versions to the most recent ones. For the most part it worked well, but when we run the rdb loader, triggered by a successful shredder run, we see this error:
ERROR Loader: Loading of s3://our-bucket/events/shredded/2022-03-07-16-00/run=2022-03-07-16-00/ has failed. Not adding into retry queue. SQL `NULL` read at column 1 (JDBC type Char) but mapping is to a non-Option type; use Option here. Note that JDBC column indexing is 1-based.
According to this topic, possible issues are rooted in the manifest table. This does not seem to be the case, as we have deleted and re-recreated the table from scratch and the result of the loader is the same error message.
Other things of note:
- There are NO queries at all to be found in the Redshift logs in case of a failed run
- If we run the rdb-loader against a newly created schema, it completes successfully. The error only occurs when we run the loader against our current schema, already filled with past events
- The manifest and event tables are at the latest versions (0.2.0, 0.11.0)
- The legacy EMR-ETL-Runner is able to successfully load events into the same schema
Software versions used:
- Iglu: 0.8.2
- enrich-kinesis: 3.0.0-rc44
- Shredder: 2.2.0
- RDB-Loader: 2.2.0
Any help or tips on how to solve the issue is appreciated.
UPDATE: We have migrated unmatched legacy tables for custom schemas via
igluctl to no avail. The same error still persists.