Bq-failed-inserts topic reason

I am a beginner of Snowplow. I just set up a Snowplow pipeline on GCP, including tracker, collector, enrich, and storage(bigquery). Most of the events data could be received in bigquery as expected, but there is always some data missing. I tried to figure out the problem, and I finally found them in the bq-failed-inserts topic(pubsub). I also tried to use bq-repeater to send it to bigquery again, but also failed. I am not quite sure the reason why it will be dumped into bq-bad inserts topic. I think it should not have something to do with the schema, if it does, then it will be dumped into an enriched-bad topic with schema violation errors. And I also tried to consider it as a latency issue, but it is always the same event having this issue, really weird.

Most of the time failed inserts should be due to mutator lag - but it is possible in certain circumstances that it may just not be possible to insert a certain row into BigQuery (e.g., due to a discrepancy between the JSON schema and the BigQuery schema).

Do you have an example row / schema that you are able to replicate the issue with? Do all events / entity schemas in the payload have a corresponding column in the BigQuery table?

Thanks Mike. I was using GTM to do the event tracking, and I have confirmed that the json schema of datalayer is the same as the bigquery schema. And just as I mentioned last time, all the missing data was dumped into bq-failed-inserts topic. And when I used the bigquery repeater, I found the data was finally dumped to my google cloud storage as loader recovery error. The following is part of badrows information.

{"schema":"iglu:com.snowplowanalytics.snowplow.badrows/loader_recovery_error/jsonschema/1-0-0","data":{"processor":{"artifact":"snowplow-bigquery-repeater","version":"0.6.1"},"failure":{"error":{"message":"no such field: contexts_com_snowplowanalytics_snowplow_geolocation_context_1_1_0.","location":"contexts_com_snowplowanalytics_snowplow_geolocation_context_1_1_0","reason":"invalid"}},"payload":...

Actually, in my front end, I have never used a field like contexts_com_snowplowanalytics_snowplow_geolocation_context_1_1_0, I am not sure why it will happen. Do you have any idea about this?

If you are using the Javascript tracker this will be caused by setting geolocation: true in the contexts object when initialising the Javascript tracker.

I’m quite curious as to what is in the payload that is making it invalid if you are able to anonymise the fields.