I’m in the middle of validating my Snowplow data and I realized I have duplicate event ids in my BigQuery table.
This is an example query for one day of data:
From reading other posts it sounds like there is no deduplication mechanism for the GCP pipeline for now? The event fingerprinting enrichment was mentioned as well.
What is best practice on GCP to solve this? Are you doing this in the data modeling stage?