Debugging bad data in GCP with BigQuery – Snowplow

antman · December 19, 2018, 11:10pm

One of the key features of the Snowplow pipeline is that it’s architected to ensure data quality up front - rather than spending a lot of time cleaning and making sense of the data before using it, schemas are defined up front and used to validate data as it comes through the pipeline. Another key feature is that it’s highly loss-averse: when data fails validation, those events are preserved as bad rows. Read more about data quality.

This is a companion discussion topic for the original entry at https://snowplowanalytics.com/blog/2018/12/19/debugging-bad-data-in-gcp-with-bigquery/

Topic		Replies	Views
Debugging bad rows on GCP – Snowplow	1	936	December 20, 2018
Debugging bad rows on GCP – Snowplow GCP pipeline	1	979	December 20, 2018
A new bad row format RFCs	10	3630	September 23, 2019
Bq-failed-inserts topic reason GCP pipeline	3	975	September 1, 2021
Schema Violations error GCP pipeline	2	1019	January 27, 2022

Debugging bad data in GCP with BigQuery – Snowplow

Related Topics