Which timestamp is the best to see when an event occurred?


#1

Which timestamp is the best to see when an event occurred?

A common question among Snowplow users is ‘what timestamp should I use?’. While this depends on what you want to achieve, most users want to know when an event occurred. For that the derived_tstamp is the best choice.

In this post we’ll expain the different timestamps, what they mean and why the derived_tstamp is generally the best to use.

Available timestamps

These are the timestamps Snowplow uses:

  • collector_tstamp

  • dvce_created_tstamp

  • dvce_sent_tstamp

  • derived_tstamp

  • true_tstamp

The first three (collector_tstamp, dvce_created_tstamp and dvce_sent_tstamp) are used to calculate the derived_tstamp. We’ll explain each of them below and why they are by itself not the best choice to accurately see when an event happened. We’ll also show how the derived_tstamp is calculated and explain when to use the true_tstamp.

collector_tstamp

Timestamp for the event recorded by the collector

We can trust the collector timestamp to be accurate but it’s possible that there’s a delay between an event being created and the event arriving at the collector.

A classic example is when a device goes offline. New events will still be created, but they are cached in local storage until the connection is restored. When the connection is restored, all events in the cache are sent at once, so all events that were generated during this time will end up with the same collector timestamp.

The collector_tstamp is therefore not the best choice to see when an event happened or for building attribution models.

dvce_created_tstamp

Timestamp the event was recorded on the client device

This timestamp is created using the clock of the device. As a general rule, device clocks (which by definition are clocks that are not under our control) cannot be trusted to be accurate. What we can do is use them to calculate the relative time between events that were created on the same device.

dvce_sent_tstamp

Timestamp the event was sent by the client device

When an event is created it is not always sent immediately. Sometimes a device is offline and the event will be cached and sent once the connection is restored.

This timestamp has the same issue as the dvce_created_tstamp; we cannot trust the device clock to be accurate but it is reasonable to assume that the clock is internally accurate. In other words a 23 minute gap between the dvce_created_tstamp and the dvce_sent_tstamp will actually be 23 minutes.

derived_tstamp

Timestamp making allowance for innaccurate device clock

So now that we know that:

  • The collector_tstamp is accurate but does not show when the event was created
  • The dvce_created_tstamp and the dvce_sent_tstamp are accurate in relation to each other

We can use this knowledge to calculate the time the event actually happened. So we calculate the difference between the two client timestamps and apply that delta to the collector_tstamplike this:

derived_tstamp = collector_tstamp - (dvce_sent_tstamp - dvce_created_tstamp)

This is why the derived_tstamp is the best to see when the event acctually took place.

There are two exceptions:

  • If either of the dvce_ timestamps are not set, then the derived_tstamp == collector_tstamp
  • If the true_tstamp is set, then the derived_tstamp == true_tstamp.

true_tstamp

User-set “true timestamp” for the event

The true_tstamp is a special timestamp that is only used in rare cases, most often when you want to ingest historical data. In that case, the collector timestamp is irrelevant (it would be set to the time of ingestion - not when the historical event happened).

In those cases, you can explicitly set the true timestamp, which will be passed on to the derived timestamp.

If the true timestamp is set, the pipeline will ignore all other inputs and set the derived timestamp to the true timestamp.

Full algorithm

Following on from the information above, this is the full algorithm of the derived_tstamp:

Step 1
Check if the true_tstamp is set. If so, derived_tstamp = true_tstamp.

Step 2
Else, check if either dvce_sent_tstamp or dvce_created_tstamp are missing. If so, the derived_tstamp will simply be equal to the collector_tstamp.

Step 3
Else the derived_tstamp is calculated like this:

derived_tstamp = collector_tstamp - (dvce_sent_tstamp - dvce_created_tstamp)

Trackers by timestamp capability

Here is the current snapshot of timestamp capabilities across all of our trackers. If this is out of date, please add a comment to this thread, and we will update the table!

Tracker dvce_created_tstamp dvce_sent_tstamp true_tstamp derived_tstamp
ActionScript3 Tracker Yes No No collector_tstamp
Arduino Tracker No No No collector_tstamp
Android Tracker Yes Yes Yes All steps
CPP Tracker Yes Yes Yes All steps
Golang Tracker Yes Yes Yes All steps
iOS Tracker Yes Yes No Steps 2 & 3
Java Tracker Yes No No collector_tstamp
JavaScript Tracker Yes Yes Yes All steps
Lua Tracker Yes No No collector_tstamp
.NET Tracker Yes Yes Yes All steps
Node.js Tracker Yes No No collector_tstamp
PHP Tracker Yes No No collector_tstamp
Python Tracker Yes Yes Yes All steps
Ruby Tracker Yes Yes Yes All steps
Scala Tracker Yes Yes Yes All steps
Unity Tracker Yes Yes No Steps 2 & 3

The Pixel Tracker and the Google AMP Tracker do not use timestamps.