We’ve been using Snowplow self-describing events for a long time and only recently we noticed that some of these self-describing events do not have any “real” counterparts in the
events table (joining on
root_id = event_id returns nothing).
The beginning of these occurrences exactly matches the date when we upgraded to Snowplow R97 (also, changed Clojure collector to Scala collector + Kinesis streaming). The amount of bad events is relatively small compared to normal ones (below 1%) so we cannot really see the lack of anything.
How can it be? Events come in single lines and thus should be held together until shredding and loading, how can they go missing?