How to collect all the badrows & badrows complete classification

phxtorise · October 26, 2021, 8:23pm

I have built a badrows pipeline with the Google Cloud Storage loader. For now, I use GCS loader to load only enriched-bad topic in PubSub. And I only received five kinds of errors, including schema violations, adapter failures, enrichment failures, collector_payload_format_violation, and tracker_protocol_violations. But I saw there are eleven errors listed in this repo (snowplow-badrows-tables/bigquery at master · snowplow-incubator/snowplow-badrows-tables · GitHub). I am also not quite sure how I can collect all the existing bad rows in my pipeline. Do I need to set up multiple dataflow jobs for each topic, like bad, bq-bad-rows, bq-failed-inserts?

Another question is about how to classify all the error types into the four categories below in the image.

This image is found online, which should be the official classification. But I only found six errors on the page(maybe some errors are folded). Is there any way to find the complete classifications and their detailed subdivisions?

Thanks!

mike · October 27, 2021, 7:25am

The full list can be found in the docs here.

I’m making a guess as to how to categorise them but I’d probably go

adapter_failures (collector)
collector_payload_format_violation (collector)
enrichment_failures (enrichment)
generic_error (any)
loader_iglu_error (destination)
loader_parsing_error (destination)
loader_recovery_error (recovery)
loader_runtime_error (destination)
recovery_error (recovery)
relay_failure (destinations)
schema_violations (enrichment)
size_violation (size validation / collector)
snowflake_error (destination)
tracker_protocol_violation (tracker protocol)

Note that each of these schemas has a processor.artifact field associated with it which will tell you what component raised the failed event.

Topic		Replies	Views
Failure NotTSV with BigQuery Loader Troubleshooting	1	774	August 19, 2022
GCP notTSV loader error - GCP pipeline	3	918	December 18, 2020
[SOLVED] Bad rows for schema violations are not loaded into Elasticsearch Troubleshooting	2	948	March 2, 2022
Debugging bad rows in Athena [tutorial] For data modelers & consumers	4	10891	January 22, 2017
Querying Failed BigQuery Events in GCS GCP pipeline	2	716	January 24, 2023

How to collect all the badrows & badrows complete classification

Related Topics