"Account" concept in Snowplow events

Hi there,

Soon we will start operating in the US and we have started thinking how does that impact our event data collection. Since it’s a different market and we will open a company there, we would like to tag the traffic to differentiate it from the UK operations. At the same time, we would like to reuse our current infrastructure.

One option would be to tag the events with an “account ID” (or something equivalent), but the canonical event model doesn’t have a field that could be used for this purpose. The closest one would probably be app_id, but we’re already using it to tag which application fired the event.

To sum up, I guess the general question is: how would you recommend a multinational company to use Snowplow so that each market can be analyzed independently?

Many thanks,
Dani

Hi @danisola,

Just wanted to give you a hint about

If you do need to tag your events with an “account ID” it could be done with custom contexts. Any track event (if we are talking about JavaScript tracker) could “enclose” custom parameters as the last argument.

For example, the pageview event could be extanded like shown below:

window.snowplow('trackPageView', null, [{
    schema: "iglu:com.example_company/market/jsonschema/1-0-0",
    data: {
        accountID: 'US12345'
    }
}]);

Indeed, the added value won’t go into atomic.events table. You do need to create a new dedicated table to have those values loaded into. In the above example, the new table will be atomic.com_example_company_market_1. Additionally, you will have to create JSON schema to have your values validated against as well as JSONPaths file to facilitate Redshift data load to this table.

Once populated, you can link the pageview event (or whatever event you add your contexts to) to your “Account ID” by means of relation event_id = root_id and collector_tstamp = root_tstamp where event_id is the value from atomic.events table and root_id comes from your dedicated table (here atomic.com_example_company_market_1).

Some additional wiki pages about the topic:

–Ihor

How will you be launching the products/services in to the US market? i.e., if it’s a different site example.co.uk vs example.com you’ll get the domain and it’s associated levels for free.

If it’s just the single site for multiple markets things get a little bit more interesting but it’s a perfect use case for using the GeoIP information that Snowplow provides as an enrichment. The datasource (Maxmind) can provide you with postcode, city and country level information for a given IP address (with varying degrees of accuracy depending on location) that should enable you to very easily querying traffic originating from the US select * from atomic.events WHERE geo_country = 'United States'

@danisola

We ran into a similar situation a while back and ultimately settled on a naming convention inside of the app_id. For example, us-website vs uk-website. This was chosen in favor of a custom context suggested by @ihor to keep pressure off developers to remember to add context across our 100+ apps.

Once a naming convention is established, data modeling can be done in redshift to parse out the app id into respective columns for easy accessibility during analysis time.

2 Likes

Hi all, thanks for your answers!

@ihor using contexts is a viable option, but we prefer to avoid it because of the performance costs of joining several tables in Redshift. For an unstructured event, that solution would require to join the events table, the shredded table and the context table for most queries. Ideally we would like to use a field in the events table.

@mike we will have different domains, but page_urlhost is only set for client-side events and we have quite a few that are server-side. Or maybe you were thinking about another field? The Maxmind solution it’s also viable, but we prefer to use a field set by the app, as opposed to derived from other fields.

@digitaltouch I hadn’t thought about your solution, prefixing the app_id field with the country solves the problem quite nicely!

Thanks again for all your answers. I’ll let you know which one we use once we’ve discussed it.

1 Like