Tracking email events (sends, opens and clicks) with Snowplow [tutorial]

Tracking email sends, opens and clicks with Snowplow is very common. In this post we detail how to do the above.

1. Tracking email sends

How you do this depends on how you’re sending emails.

If you’ve written your own application to send emails, you’d use one of our trackers to record the send event: if it was a Python app that sent the email you’d record it using our Python tracker, or a Java app you’d use a Java tracker.

If you’re using a third party email provider e.g. SendGrid or Mandrill, depending on the provider it may be possible to automatically grab this data via our webhooks. You would configure the mail provider to stream event-level data into your Snowplow pipeline: instructions on how to do this can be found here. If you’re using a third party that we do not currently support, then let us know and we can look at adding support for that third party.

2. Tracking email opens

Again, how you do this depends on how you’re sending emails.

If you’re sending your own emails, you’d embed a Snowplow pixel in the email and decorate the query string parameter with the different data points that you wanted to pass into Snowplow. An example would be:

<img src="://my.collector.url/i?e=se&p=web&tv=no-js-0.1.0&se_ca=email&se_ac=open&se_la={{email_id}}&se_pr={{recipient_id}}">

When the email is opened the pixel will be fetched and the event recorded.

In the above example we’re recording the event as a custom structured event. You can decipher the meaning of each name/value pair on the query string by referring to the Snowplow Tracker Protocol. Note that:

  1. The parameters e (event), p (platform) and tv (tracker version) are required for the event to pass validation
  2. The other parameters se_ca, se_ac, se_la and se_pr are specific to custom structured events. If you were recorded the event using another event ‘type’ you’d pick different parameters. Custom structured events are a good option if you can fit the data you want to record in the five fields (category, action, label, property and value) that the event includes.
  3. In the example we’ve assumed that the {{email_id}} and {{recipient_id}} will be dynamically populated into the property and label fields, so that you record which email was opened, and who opened it.

A tip: use the Iglu Webhook to record email opens where you’ve defined your own email open schema

If you want to record more data than the custom structured events support, or you’d prefer to define your own schema, we recommend sending the data into Snowplow using the Iglu Webhok. This enables you to send data points for your own unstructured event as a set of name values pairs on a querystring directly (rather than composing a self-describing JSON). If you had a simple email_open schema like the following:

{
	"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
	"description": "Schema for an email open event",
	"self": {
		"vendor": "com.mycompany",
		"name": "email_open",
		"format": "jsonschema",
		"version": "1-0-0"
	},

	"type": "object",
	"properties": {
		"emailId": {
			"type": "string"
		},
		"recipientId": {
			"type": "string"
		}
	},
	"additionalProperties": false
}

Your tag would look like:

<img src="://my.collector.url/com.snowplowanalytics.iglu/v1?schema=iglu%3Acom.mycompany%2Femail_open%2Fjsonschema%2F1-0-0&emailId={{email_id}}&recipientId={{recippientId}}">

Note:

  • The url host is the collector host, as before
  • The path is /com.snowplowanalytics.iglu/v1, to indicate to Snowplow that this should use the Iglu webhook. (Rather than the 1x1 pixel that is served from /i
  • The Iglu schema then needs to be sent as one of the parameters.
  • The different fields in the schema can then be passed in as name/value parameters on the querystring

Tracking clicks on links in emails

The most common route to tracking clicks in emails is to decorate any links in the email with querystring parameters (e.g. utm parameters). Then when a user clicks on the link, the page view that results will be recorded into Snowplow will the relevant information on the email and recipient sent through on the querystring parameters. This is described here.

In some cases it may not be possible to use the above method (e.g. if you’re driving traffic to a website you don’t own / have tracking setup on). In this case you might want to record the click via a redirect on the collector. Click tracking this way works as follows:

  1. You update the href element in your links to the Snowplow collector. That means that when a user clicks on a link in an ad, they get taken to your Snowplow collector, so that the click event can be recorded
  2. You set the URL path to /r/tp2 to the href. This tells the Snowplow collector that it should record a uri redirect, and then redirect the user to the URL specified below (see 3):
  3. You add a new parameter &u={{url}} to the collector URL in teh href element, where the {{url}} parameter is the URL that the user should be forwarded to after the click is tracked in Snowplow. The collector uses this to correctly redirect the user to the correct target URL, after the click has been tracked
  4. OPTIONAL. In addition, you can add other name/value pairs to the URL querystring as per the Snowplow Tracker Protocol. That means you can choose how to describe / schema the click event and what data points you want to record with each click event. (E.g. pass in the different fields available in your ad server.) Note that if you do this the event will be set to the event type indicated by the e= parameter on the querystring, and the uri-redirect will be recorded as a context.

The example below is a link that will redirect to our github repo:

Check us out on <a href="http://collector.snplow.com/r/tp2?u=https%3A%2F%2Fgithub.com%2Fsnowplow">Github</a>
https%3A%2F%2Fgithub.com%2Fsnowplow

For more information see our guide to click tracking.

2 Likes

Hey @yali, thx for the tutorial. It’s really close to what we do. But shouldn’t the platform parameter reflect the fact, that those events actually come from email clients and hence set to be “email”? Just to be able to differentiate events that come after a click from the real website.

Good point @christoph-buente! I think currently the platform field only accepts certain values but we should extend these - I’ve created a ticket here: https://github.com/snowplow/snowplow/issues/2814

Ok what I want is to use the iglu webhook to track custom data, what you used on point 2.

I’m using the enrichment emr-etl-runner and the cloujure collector.

So I’ve created the schema, uploaded on my iglu registry and also uploaded the jsonpaths to it, also created the redshift database for that schema.

What I do not understand is why I can’t track nothing, I mean I open the url _my.collector.url/com.snowplowanalytics.iglu/v1?schema=… for example.

And after run everything no data is loaded, btw, on my schema there no integers, just strings values.

I am thinking if the problem is com.snowplowanalytics.iglu, if I have to add some enrich for it or some schema or something to the enrichment process.

Hi @Germanaz0 - please open a separate thread with your issue as it doesn’t relate to this specific tutorial.

I think that it is related to it, I think that the problem is because I am sending the pixel like this

my.collector.url/com.snowplowanalytics.iglu/v1?schema=iglu%3Acom.mycompany%2Femail_open%2Fjsonschema%2F1-0-0&emailId=test@email.com&recipientId=

Where the recipientId is empty, and for some reason the collector does not work

Thanks for the article Yali!
it was really helpful as I start building out our email tracking module, and the fact that a redirection engine was built in with the collector was a truly a revelation! Learning that I won’t have to build out a separate redirection server (possibly using lambda) made my day!

My website uses Sailthru as its email provider. Sailthru is a very sophisticated platform that tracks a lot of data on subscribers, which is very useful for my company. Though this data is available through its API and postbacks, many attributes and data sets are not. Since there’s no webhook for Sailthru as yet, I was planning to build out an independent tracker (opens and clicks) that can sit inside the email templates and independent of Sailthru’s own trackers.

My immediate question regarding this is about the additional load that all the redirection could cause on the collector app. Currently it handles about 60,000 events/day from the website. Once we create redirections links for our email templates, the number of events will increase to about 1mil/day (though these events are usually paced and scheduled around 4 specifics times in a day). What recommendation would you make for handling this kind of traffic?

Thanks very much!

@kjain - I recommend bumping the collector to use more instances prior to putting redirect tracking live across all your email clicks. (E.g. bump it from 2 to 4 instances.) Then check on the instance health once you’ve gone live and if all looks good, gradually scale the collector cluster back down.