Javascript tracker: How to keep track of users when changing the cookie domain? [tutorial]


#1

The Snowplow JavaScript Tracker lets you configure the domain that you want your first-party cookies to use. If you have a website that spans multiple subdomains, you probably want to track user behaviour across all those subdomains, rather than within each one individually.

Setting it up like this:

snowplow_name_here("newTracker", "cf", "d3rkrsqld9gmqf.cloudfront.net", {
  cookieDomain: ".mysite.com"
});

will make sure that events are being tracked on www.mysite.com, blog.mysite.com, application.mysite.com and all other subdomains on your site.

By default, if you don’t set the cookie domain by using the cookieDomain field of the argmap, your first-party cookies will not be given a domain. Many users who don’t have multiple subdomains choose to use the default settings when they first implement Snowplow tracking on their website. Later on, when their website grows to span multiple subdomains, they are faced with an unappealing choice. They have to either track events separately on each subdomain (via separate tracker instances); or they need to set the cookie domain name, which will reset all their existing cookies. That could be a major one-time disruption to data analytics because all visitors to the website will receive a new domain_userid.

The reset of the cookies (and hence the domain_userid) is alas unavoidable. However, there is something that you can do to mitigate the effects of the switch.

The default lifetime for first-party cookies set by the Snowplow tracker is 2 years so cookies will stick around for some time yet (provided you don’t overwrite them). This means they can be read, the domain_userid retrieved and stored safely in your database, allowing you to identify existing users even after their identity is reset.

Here’s how to do it.

1. Give your cookies a new name

The first thing to pay attention to is whether you have been setting the cookie name for the tracker instance. That is done by using the cookieName field of the argmap:

snowplow_name_here("newTracker", "cf", "d3rkrsqld9gmqf.cloudfront.net", {
  cookieName: "cf"
});

That setting would have given your domain cookie the name “_cf1_id”. If you have not explicitly set the cookie name, then it would have defaulted to “_sp_id”. Either way, you will need to give your new cookies a new name. Otherwise, the will get overwritten and you will not be able to salvage the legacy domain_userid:

snowplow_name_here("newTracker", "cf", "d3rkrsqld9gmqf.cloudfront.net", {
  cookieName: "cf_new"
});

This will ensure that both your old and your new cookies are on the user’s device and their values can be retrieved.

2. Write your schema

You can fetch the cookie values either as a custom context (that will be sent with each event) or as a custom self-describing event that will be sent once per page view. (More on making that choice in a little bit.)

In both cases, your schema will be something along the lines of:

{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "description": "Example description",
  "self": {
    "vendor": "com.examplecompany",
    "name": "snowplow_cookies",
    "format": "jsonschema",
    "version": "1-0-0"
  },

  "type": "object",
  "properties": {
    "new_domain_userid": {
      "type": ["string", "null"]
    },
    "old_domain_userid": {
      "type": ["string", "null"]
    }
  },
  "additionalProperties": false
}

Lint the schema with igluctl and generate the corresponding JSON path files and SQL table definitions. Publish the changes and generate the table that will store the data in your database. Make sure to authorize the pipeline user to write to the new table: ALTER TABLE atomic.cookies OWNER TO user.

3. Choose whether to use a custom context or a custom self-describing event

A context that will be sent with every event makes intuitive sense but there are a couple of factors that make a self-describing event the more lightweight and pragmatic approach.

The first of those is timing. The Snowplow tracker must have enough time to load and set the cookies, before you read them; and reading the cookies must happen before any events are sent. If you read the cookies before the tracker has had a chance to set them, you will get outdated information. If you sent an event before the context has been assembled, that event will be missing the context. So you’ll have to get the timing just right, which could be daunting.

By contrast, with a self-describing event you just need to make sure the event fires once per page view. You will still have to ensure the event is firing after the tracker has had time to set the cookies. But that is much easier to do, especially if you are using a tag manager such as GTM. In GTM, you can configure your custom event to fire at Window Load, which ensures that all other scripts have already been loaded. Other events might still fire before you custom cookie event; but that is alright.

The reason that’s alright is that if you have the webPage context enabled (and if you don’t, what are you waiting for?!), you already have a UUID for each page view and all events (such as page pings) that happen during that page visit. So you can already identify which events belong to the same page visit.

The second reason why a self-describing event might be better than a context is traffic. Sending the cookie data only once per page view rather than with each and every event should not add too much load to your website.

4. Write the code that will generate the data for your custom event and track it

In JavaScript that code might look something like this:

// fetch inputs for Snowplow

function getDomainUserId(cookieName) {
    var matcher = new RegExp(cookieName + 'id\\.[a-f0-9]+=([^;]+);?');
    var match = document.cookie.match(matcher);
    if (match && match[1]) {
        return match[1].split('.')[0];
    } else {
        return null;
    }
}

// create event JSON

var snowplow_cookies = {
  new_domain_userid: getDomainUserId('cf_new'),
  old_domain_userid: getDomainUserId('_sp_' | 'cf')
};

// track the event

window.snowplow_name_here("trackSelfDescribingEvent", {
  schema: 'iglu:com.examplecompany/snowplow_cookies/jsonschema/1-0-0',
  data: snowplow_cookies
});

Enjoy being able to track events across all your subdomains without having to lose all the history you already have for your known users.