Snowplow RDB Loader R32 released

We’re tremendously excited to announce RDB Loader R32 with long-awaited Redshift automigrations.
Since now on, users don’t need to care about generaing DDL files, JSONPaths, creating tables through psql - everything is managed automatically.

Automigrations

Automigrations is the biggest feature of this release. In order to get rid of manually generated DDL files, we had to start producing shredded data as TSV instead of JSON. It allowed us to get rid of JSONPath files on S3 and make shredded data compatible with Redshift tables as is.

Upgrading

You can find out how to upgrade RDB Loader on our docs website.

Other improvements

  • RDB Shredder generates bad rows using snowplow-badrows library, which means its guaranteed bad rows are compatible with any recovery job using the same library
  • We’ve improved cross-batch deduplication algorithm when used with RT pipeline, however we still have reports about enriched data missing in Redshift. We discourage our users from using cross-batch deduplication along with Stream Enrich

Known issues

  • When loading TSV data, RDB Loader treats empty strings as NULL values, but enriched data may contain empty strings in required columns, which means Loader would attempt to load them into columns with NOT NULL constraint, which will result into loading failure. In order to avoid that, make sure that all non-nullable, required string properties also have minLength: 1 constraint in JSON Schema.
  • Some our users report disrepancy between enriched and shredded data when cross-batch deduplication is enabled, small portions of data don’t get Redshift. We’re investigating this problem and will release a fix as soon as possible

Roadmap

  • Performance improvements. Some our users noticed that starting R30, RDB Shredder started to perform slower than before
  • Addressing problems mentioned in Known issues section is a top priority for next release
  • Next version won’t have umbrella R-prefix. Apps will share the same version
4 Likes