Looking for a script/solution to build a simple GCP pipeline with Bigquery as sink

Hello all,

Goal

I am looking for a script/mostly automated solution that we can use to build a simple GCP pipeline with bigquery as the sink.

Background

We have been trying to build a runnable GCP pipeline for a few weeks now, but are failing with various error messages and problems starting from the enrichment phase. What we have tried so far:

I’m not going into too much detail in this thread about the current problems with the above approaches (I’ll probably keep trying to do that in parallel in other threads). We had the most hope in the more modern approach with Terraform in the current Quick Start Guide. Unfortunately, even here the events do not survive the enrichment phase.

This approach (GithHub: etnetera-activate/snowplow-gcp-template) is already very close to the described requirements, but unfortunately it is already two years old and uses very old versions in the enrichment phase. This subsequently requires the use of old distribution images, outdated Java versions, etc. which creates further problems and is certainly not a sustainable approach.

Long Story Short

Does anyone have an currently working and mostly automated solution to set up the described simple pipeline on GCP with Bigquery as sink?

I appreciate any form of help.

2 Likes

I appreciate the Quick Start installation isn’t exactly what you’re after, however, I’d strongly recommend you start there and get that working first. If the Quick Start isn’t working (the simplest way to deploy your Snowplow infra) then figuring out the more complex deployment will be tricky.

Whilst the quick start won’t load to BigQuery (Postgres instead), it is the easiest and most automated approach of using modern Snowplow components.

If you can get that working, then you will be well on your way to figuring out Snowplow and adding BigQuery loading to your pipeline.

1 Like