BigQuery Loader - Mutator

I have a couple of questions regarding the BigQuery loader and mutator.
I have a pipeline in production which I completely provision using Terraform. It has a VM group for the collector and a VM group to facilitate the beam enrich and the BigQuery Mutator & Loader

When I would like to update my snowplow BigQuery Loader, my VM group will restart and re-execute the mutator task to create the table in BigQuery.
First question: If my table already exists, will the mutator overwrite my table?

Secondly, I would like to use the snowplow deployment to collect, enrich and store the data of multiple websites. Is it possible to separate the data into multiple tables while still using the same pipeline?

Thanks in advance,
Sam

No, the mutator shouldn’t overwrite your existing table. By restarting it you’ll clear it’s internal cache with respect to what columns it has created, but this cache is refreshed on initialisation.

If you are running a single pipeline there’s a few different options:

  • Run multiple collector / pipelines (this might be desirable for first party cookie setting / ITP)
  • Split data out at enrich time (and have one BQ loader for each app_id for example)
  • Split data once the data has been sunk into BigQuery and either create an incremental table / view (you probably want a materialised view) per app_id that runs on a frequent basis.
1 Like

Thanks Mike! This answer the questions superbly.

Just given this point a second thought, if I have multiple tags in different GTM’s coming to the same collector (same IP & domain) it aren’t first party cookies anymore?

These can still be first party cookies (for domain_userid) but the question becomes if you want to stitch network_userids across these sites.

Re: multiple tags on different domains - depending on how many domains you are running from a single collector this increases the risk that ITP / Webkit will flag the collector domain as engaged in cross-site tracking.

This is not the purpose, it would be to spread the costs of lower traffic sites.

Is there more reading material for this? What I need to do to prevent this?

Thanks for your clear explanation.

I’ve found https://www.cookiestatus.com/ from @simoahava to be a particularly useful resource for this across different browser implementations.