I’m using PhantomJS to ping the Snowplow pipeline we’ve set up (JS tracker on test website -> Scala collector -> Kinesis -> Scala Enricher -> Kinesis -> Proprietary data loader using the SDK to parse Kinesis records).
The “pinger” works in a loop - send a new request once the collector sent a response back.
Lately, I’ve been seeing inconsistencies in the number of results sent and received, and on closer inspection it was because of duplicate "event_id"s.
I’ve read the doc here: http://snowplowanalytics.com/blog/2015/08/19/dealing-with-duplicate-event-ids/#deduplicating-the-event-id , however I’d still like to venture out and ask how is that possible, given the ids are in essense UUIDs.
I’d like to hear your ideas.