Snowplow Events from Google Bucket to BigQuery

tziegler · July 29, 2020, 2:10pm

Hi all,
we are saving our Snowplow Events (atomic + custom contexts) both to BigQuery and to a Storage Bucket. Because of a bug in our pipeline, we now need to copy the events from the Buckets to BigQuery. Is there any documentation or ideas how to associate the raw events in the buckets with the correct event schema?
I would really appreciate your help

Colm · July 29, 2020, 3:13pm

Hi @tziegler,

I think that when you say ‘raw events’, you’re referring to the events that come out of the enrich job - if that’s not correct please let me know.

If that’s the case, I would recommend writing a dataflow job to insert the data back into the ‘good’ PubSub topic, for the loader to re-load them to BigQuery. The BQ loader will consume them and will handle all the schemas and mutation required to re-load all the data for you. You could try to do it manually but you’d basically just be trying to re-write the job that the loader already does - which sounds to me like an awful lot of work.

To give you an example, this project reads bad row data (which come from failures at loader) and inserts them from GCS to Pub/Sub. This is basically what you need to do except the rows are already good, so you just need to insert them as-is.

Hope that helps!

Best,

Topic		Replies	Views
Bq-failed-inserts topic reason GCP pipeline	3	959	September 1, 2021
Example: Running Snowplow real-time pipeline on GCP with Kafka and Kubernetes Kafka real-time pipeline	6	2940	June 1, 2017
Reason for bq bad events topic Storage targets	6	1024	September 7, 2021
Streaming to BigQuery with google enhanced ecommerce events (unstructured events) Storage targets	11	2887	August 12, 2016
Creating new custom event and navigate it to new designate table Collectors	3	586	May 3, 2023

Snowplow Events from Google Bucket to BigQuery

Related Topics