Global Contexts RFC


#1

As part of our drive to make the Snowplow community more collaborative and widen our network of open source contributors, we will be regularly posting our proposals for new features, apps and libraries under the Request for Comments section of our forum. You are welcome to post your own proposals too!

This Request for Comments is to offer our users the ability to define the way custom contexts are sent globally for the Javascript Tracker. By this we mean defining which contexts are to be sent at the tracker level rather than having to do so for each individual Snowplow event. Whilst the RFC only covers the Javascript Tracker, the plan is to roll out the agreed functionality across our iOS and Android trackers after. With this in mind, feel free to add to the discussion with mobile platform considerations in mind.

Why release global contexts?

For some time a number of our customers have requested, both implicitly and explicitly, more control over the way that custom contexts can be sent with events (both custom and first-class). Whilst having “more control” is quite vague, we can infer three main issues based on the issues raised and the conversations we’ve had:

Issue 1: Can’t send custom context arrays with all events (see JS issue #405)

It’s common to require a context to be sent with every event type (both custom events and 1st class Snowplow events such as page views, page pings, etc.). Whilst it’s possible to send some of our predefined contexts globally (e.g client_session_context and mobile_context), there is no way to do this for custom contexts at the tracker level.

In saying sending contexts at the tracker level, we mean building contexts prior to firing the track methods.

Issue 2: Can’t send custom context arrays for all events of a specific type (see JS issue #405)

This issue is similar to issue 1 but the requirement is slightly different. It’s not uncommon to need to send a context for a specific type of event. For example, let’s say that I use Snowplow to track multiple websites and that I defined a group of events related to user engagement (likes, comments, uploads) on just one of the sites I’m tracking. There is currently no way to send a specific array of contexts across all engagement events on that one site via a single tracker setting.

Issue 3: Can’t send custom context arrays dynamically based on specific event data (see JS issues #519 and #585)

The final issue stems from the need to send a context when an event fires in a specific way. For example, let’s say that I want to send a context with user metadata only when specific links are clicked on my website. There is currently no functionality which allows our users to introspect the tracker payload and conditionally send a context based on the event data.

Current solutions

The workarounds to the issues above are currently quite time-consuming. Ensuring that the right contexts are sent with each specific event type currently entails maintaining a strict process when implementing our tracking, potentially spending extra time reviewing the code for each event or writing complex validation queries on the data warehouse. In a tag manager world it’s not uncommon to define some custom JavaScript which dynamically constructs a context array out of the different objects in the datalayer to solve the problem. For some of our customers who send several different types of events with context arrays that also vary, being able to fully QA an implementation isn’t realistic.

A proposal for releasing global contexts

The following section will outline potential features to enable users of the Javascript tracker to do the following:

  • Design goal 1: Send a context array with all events
  • Design goal 2: Send a context array for specific event types
  • Design goal 3: Send a context array for events that satisfy certain conditions

To achieve our goals, a new method with two new features will be introduced:

Introducing context primitives

The building blocks of global contexts can be referred to as context primitives. When a context primitive is the only argument to our method, it will always be sent. Context primitives are either a self-describing JSON or a function that returns a self-describing JSON (which we’ll refer to as a context generator). A context generator is evaluated with three arguments, the event type, event payload and schema.

Example 1: Send context with all events using context primitive (to solve design goal 1)

Below is an example of what the method would look like where a geolocation context is sent with all events of all types. The context primitive is a self-describing JSON in this example:

Note: the JavaScript tracker would be defined before the following code:

var geolocationContext = {
  schema: 'iglu:com.snowplowanalytics.snowplow/geolocation_context/jsonschema/1-1-0',
  data: {
    'latitude': '40.0',
    'longitude' : '55.1'
  }
};

window.Snowplow('addGlobalContext', geolocationContext);

Example 2: Send context generating function with all events using context primitive

In the next example the context primitive takes the context generator form, i.e a function which returns a self-describing JSON. The context generator in this example decodes the event payload and returns a desktop_context if app_id == web_app, and otherwise returns a mobile_context :

function platformContextGenerator(payload, eventType, schema) {
  var context = {};
  // if certain data is available in the payload, attach a certain context to the event
  if (payload['aid'] === 'web_app') {
    context['schema'] = 'iglu:com.snowplowanalytics.snowplow/desktop_context/jsonschema/1-0-0';
    context['data']['osType'] = window.navigator.platform;
    context['data']['osVersion'] = window.navigator.oscpu;
  } else {
    context['schema'] = 'iglu:com.snowplowanalytics.snowplow/mobile_context/jsonschema/1-0-1';
    context['data']['osType'] = window.navigator.platform;
    context['data']['osVersion'] = window.navigator.oscpu;
    context['data']['deviceManufacturer'] = getDevice().vendor;
    context['data']['deviceModel'] = getDevice().model;
  }
  return context;
}

window.Snowplow('addGlobalContext', platformContextGenerator);

Introducing context providers

Contexts can also be sent when certain event criteria are met. In order to send events conditionally, we supply the tracker with a context provider. A context provider has two parts:

  1. Something defining when it’s sent
  2. The context primitive that will be sent

Context providers can be used in two ways:

Context providers: Matching against certain schema (to solve design goal 2)

When matching against certain schema , we can refer to that context provider as a “path context provider”. The conditional part of the provider is a rule set that can be any number of regex strings that will be matched against the event schema.

A rule set must be composed of valid iglu schemas e.g iglu:vendor/name/format/version where any part of the URI can be referenced. Periods can be used to make the fields less strict e.g accepting iglu:com.mailchimp/././. to send a context for any event which has a schema that beings with that string.

Example 1: Simple schema match with path context provider

In the below example our context provider takes the “path context provider” form since we are matching against the schema path. The example would send a context for all Snowplow events which have a schema that begins with com.mailchimp. The contextGenerator variable could be JSON or a function which returns a JSON:

window.Snowplow('addGlobalContext', {accept: 'iglu:com.mailchimp/././.'}, contextGenerator);

Example 2: Filtering on the start of a schema path with path context provider

In the below example the path context provider is used to send a context so long as the event name doesn’t start with or include com.mailchimp.

window.Snowplow('addGlobalContext', {accept: '!iglu:com.mailchimp/././'}, contextGenerator);

Context providers: Matching against certain event data (to solve design goal 3)

When matching against event data, we can refer to that context provider as a filter context provider. The conditional part of the provider is a user-supplied function that takes the event payload and returns a boolean (when true the context is sent).

Example: Filter context provider example

In the below example our context provider takes the “filter context provider” form since we are sending the event given the output of the filterFunction. The filterFunction is a boolean function which decodes the event payload and applies an expression to it. If the output is 1, the context will be sent and vice versa when the output is 0. A structuredEventFilter is applied in this case to send a context (or context generating function) for all events.

function structuredEventFilter(payload, eventType, schema) {
  return eventType === 'se';
};

window.Snowplow('addGlobalContext', structuredEventFilter, contextGenerator);

This RFC has so far explained the simplest applications of the new potential features. With Global Contexts we want to give users the ability to combine methods as well as defining any filter / context generating functions - opening up several new possibilities.

Further path context provider examples

// Only attaches contexts to this one schema
var ruleSetAcceptOne = {
  accept: ['iglu:com.mailchimp/cleaned_email/jsonschema/1-0-0']
};

// Only attaches contexts to these schemas
var ruleSetAcceptTwo = {
  accept: ['iglu:com.mailchimp/cleaned_email/jsonschema/1-0-0',
  'iglu:com.mailchimp/subscribe/jsonschema/1-0-0']
};

// Only attaches contexts to schemas with mailchimp vendor
var ruleSetAcceptVendor = {
  accept: ['iglu:com.mailchimp/././.']
};

// Only attaches contexts to schemas that aren't mailchimp vendor
var ruleSetRejectVendor = {
  reject: ['iglu:com.mailchimp/././.']
};

// Only attach to Snowplow first class events
var ruleSet = {
  attach: ['iglu:com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-4']
};

Further filter context provider examples

// A filter that will only attach contexts to structured events
function structuredEventFilter(payload, eventType, schema) {
  return eventType === 'se';
};

// A filter that will only attach contexts if a certain payload value is present
function payloadFilter(payload, eventType, schema) {
  return payload['data']['dataOfInterest'] === 'valueWeExpect';
};

function pageViewFilter(payload, eventType, schema) {
  if (eventType === 'pv') {
    return payload['aid'] === 'ABC';
  }
  return false;
};

Example syntax for combining methods

// A context generator and normal, pre-defined context sent with all events
window.Snowplow('addGlobalContext', [contextGenerator, geolocationContext]);

// Attach two contexts (one generated, one constant) to all events with mailchimp vendor
window.Snowplow('addGlobalContext', {accept: 'iglu:com.mailchimp/././.'}, [contextGenerator, geolocationContext]);

// Will only attach these contexts to structured events
window.Snowplow('addGlobalContext', structuredEventFilter, [contextGenerator, geolocationContext]);

Syntax

window.Snowplow('addGlobalContext', 'regex' | 'array of regexes' | 'filterFunction', myContextGenerator | myContext);
  • addGlobalContext: Tells the method to add a global context.
  • 'regex': Regex (matched against an event’s schema) which tells the method which events the context should be attached to.
  • 'array of regexes': Optionally, a collection of regexes which tells the method which events the context should be attached to.
  • filterFunction: Any JS function which reads the event payload and returns a boolean corresponding to whether myContext will be attached to the event.
  • myContext: Custom context or array of custom contexts.
  • myContextGenerator: Any JS function which reads the event payload and outputs either an empty array or array of custom contexts.

Discussion

The following section covers the most contentious decisions that we’d like readers of this RFC to weigh in on:

Errors with filterFunction or myContextGenerator

As explained previously, filterFunction and myContextGenerator need to output booleans and self-describing JSON’s respectively. In the event that a user were to define functions that don’t return proper value types, the JavaScript Tracker will send the event without the required contexts. We’ve opted for this approach over not sending the event at all because it will be more straightforward to debug.

Combining global contexts

If a user would like to attach more than one context array globally, the addGlobalContext method can be combined with other instances of the method. For example, let’s say that a certain array of contexts is required to be sent with conversion events whilst another is required to be sent with engagement events. Two addGlobalContext methods could be used to dynamically send the appropriate contexts with each event type.

If there are contexts within each array that are the same then the user could alternatively define one single global method to send the contexts which are shared by both events. Two more global methods could then be defined to attach contexts which are unique to engagement / conversion events separately.

Collisions of context arrays sent globally and sent using first-class tracking methods

For the first release of this feature, we’ve decided not to include any safeguards against the possibility of a user defining a context globally and also sending it using any first class method.

To clarify with an example, let’s say we’re tracking a conversion event using trackStructEvent where a user context is sent within the contexts array, providing metadata about the user who triggered the conversion. Let’s also say that we’ve set the same user context to be sent globally with all events. Assuming that there are no issues with the data confining to its schema, the single event would be sent successfully with two user context entries. Whilst safeguarding against this would be possible, it wouldn’t be trivial. At this stage, we’ve decided that our best option is to warn users to set global contexts carefully.

Collisions of multiple context arrays sent using separate global methods

Let’s say that a customer configures a custom user context to be sent with all events. Let’s also say that the customer has configured a context array to be sent conditionally for all conversion events which also includes the same user context. Assuming that there are no issues with the data confining to the appropriate schemas, each conversion event would be sent with two user contexts (alongside the other contexts included in the array that is sent conditionally). Following the same logic as the previous decision, we’ve decided against including any safeguards in the initial release.

Future work

As mentioned earlier in this design document, the consensus for the release of global contexts for the JS tracker will inform how it works for the iOS and Android trackers. Work on our mobile trackers will start as soon as the JS tracker update has been released.

Aside from this, our biggest open question regarding global contexts is around the design of the filtering syntax for schemas. We’ve given users a lot of power in being able to send / prevent sending contexts for any regex / combination of regexes. We’ll likely need to improve on the design and make it easier for our users to define such functions.

Lastly, errors with functions used within the method as well as contexts arrays being sent multiple times (when this isn’t the desired behaviour) are issues that we’ll most likely need to address. We predict that future global context releases will be around providing safeguards to issues similar to those discussed in the previous section.


#2

#5

#6