Tracking multi-lingual domains

Hi all,

We’re changing a website to be multi-lingual, and the languages/regions will be separated using sub-folders (domain-name.com/fr/, domain-name/de/ etc.)

What is the best way to set up the javascript tracker in order to enable easy analysis of pages that are duplicated across multiple languages?

For example, Page A may exist 3 or 4 times:

domain-name.com/fr/example-page
domain-name.com/es/example-page
domain-name.com/de/example-page

Analysing these pages in aggregate could prove cumbersome as the paths will all be distinct, even those these are the same page.

My initial thoughts would be to send a custom context with every page view tag, with meta information about the page in general and the country/region. So for example, a self-describing JSON could include the fields:

Page ID
Page Type
Page Category
Region
Language
Product ID
etc…

Is this a good solution? This could then mean we can analyse on a Page ID level, assuming the Page ID is the same for the product regardless of whether it is in French or German.

Does this seem like a logical structure? Does anyone have any experience in using Snowplow in sites with multiple territories/languages?

Any help is appreciated.

Jordan

1 Like

A quick & simple solution would be to pass a different app_id for each country, depending on the page URL.

Thanks Bernard.

That would still not allow analysts to analyse pages or products in aggregate across languages/regions though.

I think sending an additional context makes sense - particularly if you plan on including additional data such as page category and type as you’ve included above. The data you’ve described above sounds like a great candidate for these fields to sit in a dataLayer on the page and be injected into a page view context.

This data should compress really well assuming the cardinality is relatively low across the dimensions that you’ve specified.

The other option is to have a table external to Snowplow that contains one row per page that contains the identical information in the context in additional to URL - and join that later to atomic.events. The advantage to this is that you don’t end up with any duplication (as you would if you were sending for every page view) but the disadvantage and compromise here is that if a single page might change type or category over time then this becomes difficult - you only ever have the latest state of webpages in this table. Type and category may be a poor example here but changing the products displayed on a page (in the product_id/product_ids field) may be something you want to know for each page view - to know which user has potentially seen what products - particularly if you are A/B testing or something similar.

Thanks Mike, your second option is not an option we had considered.

We have been using the Snowplow PHP tracker to send information from the back end of the website (user and order information, mainly) so perhaps we could use the PHP tracker to send in this product information into a product meta table, and join this table to pageview/session tables through data modelling.

We did have a concern with sending in the page view context that we would very quickly inflate the amount of data storage, considering we’d essentially sending in two rows for every page view. Therefore, one of these options should help mitigate that.