Beam Enrich Update Maxmind DB

How do you update the maxmind db file in the beam enrich process? The wiki https://github.com/snowplow/snowplow/wiki/setting-up-beam-enrich describes how to set up the enrichment, but the maxmind files need to be updated from time to time. Are updates to the GCS bucket picked up by the Dataflow job automatically?

Hey @Christoph_Oelmuller - to have new databases used you will need to first upload the new file to GCS and then drain and start a new Beam Enrich Dataflow job - at present they are not picked up automatically.

thanks for the reply, josh!

No worries - we do hope to have the enrichment process deal with external databases in a better way in the future that does not require a reset of the Dataflow job but it is not on the immediate roadmap just yet!

we (kind of) solved the situation by creating a script that

  • downloads the mm db and uploads it to the assets bucket
  • spawn’s a new dataflow job
  • drains the old job

Replacing the current dataflow job with the builtin draining doesn’t work because one of the steps can’t replace its output sink:

The Coder or type for step parallelize@{Enrich.scala:117}/Read(CreateSource)/Read(CreateSource)/Read(BoundedToUnboundedSourceAdapter)/DataflowRunner.StreamingUnboundedRead.ReadWithIds has changed.