Snowplow upgrade

Hello team,

we are planning to upgrade snowplow components from dataflow job to app engine in gcp. The upgrade includes moving to latest versions as below
Collector version 2.3.0 to 2.4.1
Enricher from beam enrich 2.0.1 (dataflow job )to enrich pubsub 2.0.3(app engine)
Bqloader from 0.6.4(dataflow) to 1.0.1 (on appnegine)
Gcsloader from 0.3.1 to 0.3.2 (still remains dataflow)
Repeater Mutator to 1.0.1 (from VM to appengine)

So I just want to check if you have any guides or steps/best practices which can be used for upgrade?

Hi @Srashti

Collector version 2.3.0 to 2.4.1

I recommend going all the way to version 2.4.5, which fixes a few bugs and fixes security vulnerabilities compared to 2.4.1. In most cases the upgrade from 2.3.0 is very easy; there is nothing you need to change in your configuration. But if you terminate SSL at the collector then it’s a little bit more complicated because the SSL configuration changed, as described here in the docs.

Enricher from beam enrich 2.0.1 (dataflow job )to enrich pubsub 2.0.3(app engine)

The latest version is 2.0.5. It’s good that you are moving to enrich-pubsub, because the dataflow version will soon be deprecated. Our docs site has plenty of information on how to run enrich-pubsub. Compared to the dataflow version, it has a different command line and config file.

Bqloader from 0.6.4(dataflow) to 1.0.1 (on appnegine)
Repeater Mutator to 1.0.1 (from VM to appengine)

The latest version is 1.1.0. To help you upgrade from 0.6.4 you could check out this upgrade guide on our docs site or this discourse announcement.

Gcsloader from 0.3.1 to 0.3.2 (still remains dataflow)

This upgrade should be simple. The command line and configuration options remain the same.

Good luck! You should be able to upgrade in any order, because all applications are compatible with each other.

Thank you for your reply.
We have our snowplow pipeline on GCP. Currently we are using dataflow for enricher , bqloader and mutator repeater we are running it as jar on a compute machine.

For the upgrade we are moving from dataflow to app engine. So there will be an appengine service for enricher, bqloader and mutator repeater.

So do you have any guidelines for the steps to follow for migration from dataflow to appengine for all components.

I don’t believe Appengine is officially supported infrastructure but if you were to head down this path I’d opt for the Appengine Flex runtime with a Dockerfile for each of these components - which would not be wildly dissimilar to containerising it on Kubernetes / individual virtual machines.