Snowplow bad rows and POST requests


#1

Hi,

I had a question regarding events which are sent in a POST request and are failing validation.

We are sending multiple events at once to the scala stream collector using POST requests.
Those events are then stored on S3 and processed by the EmrEtlRunner to be loaded in Redshift.

We’ve noticed that when an event fails validation, a file is generated in the enriched/bad folder containing the whole POST request with the failed event.

We were wondering if that meant that all the events in that POST request have failed validation (meaning they are not successfully loaded in Redshift) or if only the failed event won’t be successfully loaded in Redshift?

Thank you,
Arthur


#2

@abrenaut, indeed, the whole payload will end up in the bad bucket. However, any good events from that payload will be processed fine and loaded to Redshift.

The implication of this is if you do run a recovery on the bad data you might end up with duplicated events in Redshift. You might need to take some steps to eliminate the duplicates if they are causing issues for your data analytics.


#3

Thanks for the quick reply @ihor!