Enrich 2.0.0 released!

We are very pleased to announce the release of Enrich 2.0.0!

This release marks the maturity of our newest asset Snowplow Enrich PubSub, which was first announced under the name fs2-enrich in version 1.4.0 with beta status.

Snowplow Enrich PubSub can be used in a GCP pipeline as a replacement for Beam Enrich. Unlike Beam Enrich, it does not depend on GCP’s Dataflow but rather runs independently on any compute instance. It is therefore significantly less expensive to run compared to Beam Enrich, and yet it can compete with Beam Enrich in event throughput. We estimate it could be up to 60% cheaper than Beam Enrich, although this estimate depends on your event volume.

This new asset is part of our wider plan to remove dependency on any 3rd party big data frameworks. You can read more about how recent releases of the BigQuery StreamLoader, RDB loader, and the Stream Shredder complement this plan.

Note that we didn’t deprecate Beam Enrich in this release. Beam Enrich 2.0.0 is also published with bug fixes and upgraded dependencies to offer latest developments on Scio & Dataflow.

Features of Snowplow Enrich PubSub

  • Runs all of the same enrichments you are used to with Beam Enrich

  • Automatic background updates of enrich assets, such as Maxmind, IAB, or referer parser DBs

  • Can send runtime metrics to a StatsD server, such as event counts and event latency

  • Can report run time exceptions to Sentry

Other changes in this release

The following changes also benefit users of Beam Enrich and Stream Enrich.

  • Updated YAUAA library to version 5.23

  • Updated the YAUAA context schema with new values for the device class, layout engine class, agent class, and agent security properties

  • Improved error messages on mis-configured enrichments

Upgrade guide

From Beam Enrich to Snowplow Enrich

Snowplow Enrich PubSub has a different docker image, command line and config file compared to beam enrich. It runs on the compute instance where you launch it; compared to beam enrich which submits the job to dataflow. You can find out how to run Snowplow Enrich PubSub on our
docs website.

From FS2 Enrich (1.4.x) to Snowplow Enrich PubSub 2.0.0

The hocon config file has changed to reflect the newly available features, as this asset comes out of beta. Check out the sample config for configuration options.

From Stream Enrich (1.4.x) to 2.0.0

Upgrading to Stream Enrich {nsq, kafka, kinesis} 2.0.0 is as easy as bumping the version only. No config change is required.

Enrich repository could be found here and community contributions are always more than welcome!

6 Likes