Move timezone detection to backend


#1

When we looked into ways to minimize the size of the JavaScript tracker, we found out it carries a large lookup table to detect the browsers timezone. I think this can be clearly moved into the backend, as the timezone information itself is not being used in the tracker at all. There is a closed PR that removes the lookup table, but without the corresponding functionality in the backend, it does no good.

What would be the ideal place to fill the timezone field instead of the tracker itself? Would it be the collector as the first touchpoint? Or would it be a build in enrichment?

Calling you all for opinions. Thanks


#2

If anything, an enrichment seems more sensible here. Adding enrichment logic to the collector just feels messy.

It will probably need a new parameter in the tracker protocol: https://github.com/snowplow/snowplow/wiki/snowplow-tracker-protocol#timestamp

And would we need to maintain hash maps across tracker implementations and enrichment? e.g. new time zones / changes to DST. What if they go out of sync or we have to re-enrich old data?

Should we consider letting enrichment just figure it out based on raw data instead?


#3

I wouldn’t put it in the collector because that’s just too tricky but I think it makes some sense in the enrichment. That said I wonder if we just replace the heavy lookup tables in jstimezonedetect with calls to the Intl API e.g.,

Intl.DateTimeFormat().resolvedOptions().timeZone

it’s supported enough now in browsers with the exception of UC browser which has a pretty heavy usage in China (and China only has one timezone). I’m not sure about the coverage within Android webview which could be an issue.

The only reason I don’t like the idea of putting this in the enricher is that this feels like this information should be provided by the underlying OS / browser and not inferred from a timezone offset. Having this as an enrichment would also mean having to maintain / update the underlying IANA databases which would have to be pulled from S3 or a similar service.


#4

The reason why i think it cannot be an enrichment in the classic sense is, that enrichments cannot change any fields inside the canonical model, except the derived contexts. So it either will be a built-in enrichment, that is on by default and does not adhere to the rule. Or it is done implicitly inside the collector.

I like the idea to detect the timezone inside the browser with native support. But still we have to backfill that information for browser that lack support of the Intl API.

I would not be so worried about downloading a file from S3, as the mechanism exists for other parts of the pipeline. This will be yet another hosted asset i guess.