Today we stumbled upon this issue. Basically a client-side, self-describing event was fired with a field that contained the Unicode character null (\u0000).
To unblock the storage loader, we manually removed the character from the event and rerun it. It's only the second time this has happened to us (last time was a year ago or so), but it could be quite distracting if someone started sending them on purpose.
We considered how to fix the issue properly and we came up with some options:
- Force the tracker users to remove those chars before sending the event. It's not ideal because the sanitizing code will be spread in many places.
- Remove those chars in the tracker. We don't like it much either, because it would require changing a lot of trackers.
- Use the event schemas to invalidate events that contain this character. We don't like it much because it would complicate most of the schemas. The whole event would also be discarded, which is not ideal.
- Add some sanitizing code in
scala-common-enrich that removes null characters for all self-describing events. Something like
event.unstruct_event = sanitizeString(event.unstruct_event).
Of the alternatives we prefer the 4th one, because centralizes the logic in one place and doesn't force the tracker or its users to deal with a DB-specific issue. On the other hand, the solution is quite blunt.
What do you think? Maybe you have other options?