Snowplow S3 loader 2.1.0 released

We are pleased to announce version 2.1.0 of the Snowplow S3 Loader.

This new version fixes a problematic bug that was in Version 2.0.0, in which the loader could hang during kinesis scaling events, and stop processing events. For this reason we strongly recommend you upgrade to 2.1.0 if you are currently using 2.0.0.

It also adds a new configuration option to give complete control over the partitioning of S3 directories, for example by date, time, or by the schema type of self-describing Jsons. We expect this partitioning to be particularly helpful if you use Athena to query your Snowplow failed events in S3.

Version 2.1.0 builds on the 2.0.0 release, which included features such as observability over event counts and load latency, and a better configuration format.

Upgrading to 2.1.0

If you are already running version 2.0.0 then you can switch to the 2.1.0 docker image without any change to your configuration.

docker pull snowplow/snowplow-s3-loader:2.1.0

If you want to enable the feature of partitioning files by date or schema, then set the output.s3.partitionFormat field in your configuration file. There is an examples on github, and more details in the configuration reference.

To upgrade from previous versions checkout our documentation and upgrade guide

Full changelog

Bug fixes and performance improvements

Optimise fromEnriched function
Fix duplicate statsd metrics when loading lzo files
Fix dateFormat partitioning in output path
Fix premature shutdown of HTTP connection pool

Under the hood

Update readme
Integrate lacework
Use sbt-dynver
Add Twitter Maven repository
Bump amazon-kinesis-client to 1.14.4

3 Likes