Rejecting bad events at ingestion


#1

I’m curious - is there a reason Snowplow collectors don’t validate JSON events at the point of collection and return a bad response for rows that don’t match their schema? I have faced a few situations recently where this would have saved a huge amount of head-scratching.

This it obviously inherently impossible for the CloudFront collector but it seems to me straightforward for the others and potentially very useful. I can see a few cases where some decisions would need to be made (e.g. what happens to a request with multiple events) but Is there a philosophical reason why it hasn’t been implemented?


#2

Hi @acgray - yes, the philosophical point is that a collector should not do any processing, such as validation or enrichment. All of this should be done downstream of collection.

While a collector rejecting bad data is useful for debugging (and you can achieve something similar with Snowplow Mini), it doesn’t work at scale - your end users’ thousands of web browsers or mobile apps can’t do anything with a 4xx from a collector endpoint.