Duplicate event_id

Hi snowplowers,
We got a question on the event_id duplications and the amount of duplicated event_id records are huge:


An important thing is that we have two web app and only one of them is showing this behavior, which is under QA stress testing.

Adding one more screenshot of an example:

Please let me know if you have any similar experience and if you know of the root cause. Thank you

This has a tendency to happen commonly with bots / crawlers / other automation tools but of course can happen naturally - as there’s nothing to guarantee that an event_id is truly unique at event time (this can be done but would be quite an expensive operation).

What QA tool are you using out of interest? Many crawlers / tools tend to avoid expensive operations (like generating random numbers as in the case of event_id, or by seeding the RNG ahead of time) so it’s not uncommon to see event_ids repeated.

Thank you Mike, but would it be such a big amount of duplicated ids even with bots?
And BTW we are using v2 js tracker, do you know if v3 would resolve this issue?
Thank you

One of the issues here is that the js tracker has a fallback to a math.random() based implementation for UUID generation, when the improved crypto.getRandomValues() isn’t available.
Many tools and bots have poor implementations for math.random() that aren’t very random at all (which leads to lots of UUID collisions) and have no implementation for crypto.getRandomValues(), so we can’t use that either.
v3 doesn’t fix this issue as we still fallback. In v4 we will drop support for IE9 and 10 (this is likely a while away though!) and with that, this fallback will also be removed. We discuss it (there’s quite a lot of discussion) here: Replace Math.random to prevent duplicate event IDs · Issue #499 · snowplow/snowplow-javascript-tracker · GitHub

Hard to know without looking at the data itself - but as @PaulBoocock has mentioned it’s often quite browser dependent.