Could not commit request due to validation error

Hi!
I get an error “Could not commit request due to validation error: INVALID_ARGUMENT: Pubsub publish requests are limited to 10MB, rejecting message over to avoid exceeding limit with byte64 request encoding” on my enrich step.
How I can fix it?

Which pubsub topic is it failing on the raw or the enriched topic? Are you using beam-enrich or FS2? Is it a single event?

Beam enrich sets the max record size to 7 MB so it should in theory be resized and emit a bad row if it’s higher than this limit.

Hi, mike. I use beam-enrich.

I use tutorial by Simo Ahava Install Snowplow On The Google Cloud Platform | Simo Ahava's blog

Error appears on dataflow step of beam-enrich worker.
Error stopes all processes and data don’t insert in BQ.

If it important then good-sub subscription have huge queue (over 80M unacked messages)

Error log

{
“insertId”: “7514256621418980731:34459:0:179556”,
“jsonPayload”: {
“line”: “active_work_manager.cc:1564”,
“message”: “132593 Could not commit request due to validation error: INVALID_ARGUMENT: Pubsub publish requests are limited to 10MB, rejecting message over 7168K (size 7245K) to avoid exceeding limit with byte64 request encoding.”,
“thread”: “194”
},
“resource”: {
“type”: “dataflow_step”,
“labels”: {
“job_name”: “beam-enrich”,
“project_id”: “XXXXXXXXXX”,
“region”: “europe-central2”,
“job_id”: “2021-03-30_14_43_03-13952642482494084906”,
“step_id”: “”
}
},
“timestamp”: “2021-03-31T08:38:58.534286Z”,
“severity”: “ERROR”,
“labels”: {
compute.googleapis.com/resource_name”: “beam-enrich-03301443-wkq8-harness-w1zs”,
dataflow.googleapis.com/log_type”: “system”,
dataflow.googleapis.com/job_id”: “2021-03-30_14_43_03-13952642482494084906”,
dataflow.googleapis.com/region”: “europe-central2”,
compute.googleapis.com/resource_type”: “instance”,
compute.googleapis.com/resource_id”: “7514256621418980731”,
dataflow.googleapis.com/job_name”: “beam-enrich”
},
“logName”: “projects/XXXXXXXXXX/logs/dataflow.googleapis.com%2Fshuffler”,
“receiveTimestamp”: “2021-03-31T08:39:21.828671489Z”
}

Anybody know how avoid this error?

Some of those instructions look like they might be a bit out of date and may potentially use older versions of the software.

I’d try using the setup using the latest documentation from Snowplow which includes all the instructions - Setup Snowplow Open Source on GCP - Snowplow Docs

@mike thank you for your response.

I guess I use latest versions of software:

https://dl.bintray.com/snowplow/snowplow-generic/snowplow_scala_stream_collector_google_pubsub_1.0.1.zip

https://dl.bintray.com/snowplow/snowplow-generic/snowplow_beam_enrich_1.2.3.zip

https://dl.bintray.com/snowplow/snowplow-generic/snowplow_bigquery_loader_0.6.1.zip

https://dl.bintray.com/snowplow/snowplow-generic/snowplow_bigquery_mutator_0.6.1.zip

Is it so?

Bintray is being retired shortly (see here) so for the moment I’d opt with Dockerhub as that should have the latest images of everything.

Error doesn’t solve yet. Pls help!

Hi!
On gihub I got message about this error should be fixed on tracking side - Beam: reduce MaximumRecordSize to 6900000 bytes · Issue #287 · snowplow/enrich · GitHub

How I can see messages which initiate this error?

Thank you!

Is possible to see the dataflow logs in the gcp logs explorer.

In the logs explorer, it is possible to filter the resource type by “Dataflow Step” and by the enrichment job id.
Filtering by those two fields will yield the logs for enrichment.

Attaching a screenshot of how the filter can be setup.

HI @jrluis
Thank you for your response.

I can’t find hits from frontend which greater than 7mb.
Can I find them into logs GCP or in pubsub?

How I can skip this request (wich greater 7mb) and worker continues read queue?

Error doesn’t solve yet. Pls help!