I’m not sure I fully understand the issue. But let’s start from the beginning to be sure we’re not confusing definitions - sorry for lengthy response.
Identical events with different
dvce_sent_tstamp are created usually due absence of exact-once delivery in pipeline: tracker can send event and not get response from collector and re-send it, while collector actually received original event.
Also, it happens due third-party software installed on user’s machine, such as anti-viruses or adult-content filters. They work by duplicating user HTTP requests: first request hits content-filter’s webserver and it checks if this requests is “safe” (e.g. not trying to download a malware) and if request is safe server allows filter installed on client machine to re-send this HTTP request once again to original destination, which in our case is your collector.
This can be also caused by web-scrappers or using slightly different mechanisms, but result is basically the same: same event hits collector more than once. This is what we call natural duplicates, they’re same events. Synthetic duplicates are also result of third-party software, but they have different user-set payload, so they can be different events.
We dealing with natural and synthetic duplicates in different ways:
- Natural (ones you’re describing): shred job (since R76) simply filters them out them. It happens only inside a single batch (that’s why it’s called in-batch), if next job encounters duplicate from previous job - it doesn’t know it’s duplicate and therefore it’ll appear in Redshift as well. This can be avoided since R88 by enabling cross-batch natural deduplication.
- Synthetic: since R86 they got assigned new
event_id and attached
duplicate context. I believe this is a process you’re referring to?
So, what I don’t understand is why events, where only two timestamps are different (therefore natural duplicates) remains in
atomic.events with changed
event_id (which means they processed by synthetic deduplication) as you described it.
Hope that clarifies a problem.