Just thought of another thing - depending on the tracker - do you have ‘respect do not track’ enabled? I’m not sure on the numbers, but if GA does respect, and snowplow does not, that could also lead to a difference.
Failing the above it might be a problem with the cookies.
Does your site spread across multiple domains or subdomains? Can you share the tracking code with us and the rough domains setup?
When comparing Snowplow numbers against GA, we recommend starting by looking at page views by page URL. There is very little business logic associated with recording a page view, so these numbers should typically agree very closely, with Snowplow reporting higher numbers because we don’t remove bots from the list. (Note that GA will remove many more bots than are identified by the user agent parsing libraries that are available with Snowplow. The size of the discrepancy then reflects how much bot traffic your site attracts - we see it varying between 3 and 15% for e.g. jobs boards and other sites that attract crawlers.)
Often carrying out the above step throws up differences in tracking implementation between GA and Snowplow. (E.g. pages that are missed with one and not the other.)
If those two numbers agree then explore the difference in unique visitors by page. Both Google and Snowplow primarily base this number on a first party cookie, so again our expectation is that they should be pretty close, with Snowplow reporting higher numbers because of bots.
In general we recommend avoiding comparing session numbers. The GA sessionization logic is very specific (and advertiser-friendly) - the sessionization that the Snowplow JS tracker supports out of the box follows the Adobe simple 30 minute timeout model. If you must compare session numbers we have a guide (incl. SQL):
So to summarise: my guess is that an implementation difference accounts for the very large discrepancy you’ve seen - unless you’re a site that attracts a lot of bots. (In which case - how do you filter these out when comparing?) I take it that your Snowplow numbers are higher, which makes me wonder if your GA coverage isn’t 100%? This should become clear once you start slicing the numbers by URL. It would also be worth understanding if you’ve done anything on the GA side to cusomize how uniques are identified (e.g. passing in your own user identifiers)?
To filter out bots, we are using
where br_name <> ‘Robot/Spider’
That will only get rid of a very limited set of bots and spiders - see
There is a list (which costs costs a few thousand dollars to buy) that vendors like GA and Adobe can use to filter bots which I’d expect to be more extensive than the list included in the user agent parsing libraries bundled. I’d also expect Google to have proprietary tech for spotting bots. So it’s possible that these bots account for the difference: I’d still do the check by page URL. If the discrepancy is constant across pages (or bigger for pages that are more likely to be crawled) that would suggest bots account for the discrepancy. If the discrepancy is skewed for particular page URLs, that suggests an implementation issue.
Should I do check on page_url ?
Yes I would!
So, I started with page views. We are getting difference around 2-3 %. But, same is not true for UV
Numbers we are getting for particular month-
Page views - GA (162437659) and Snowplow (167671647) which gives difference of ~3%
UV - GA(1463973) and Snowplow (1986195) which gives difference of ~35%
Considered page URL as well.
The fact that the page views number agrees is great: suggests the tracking tags have been instrumented in a very similar way.
So the question becomes why Snowplow thinks more users have visited the page than GA: it means that Snowplow thinks two users have viewed a page where GA thinks one user’s viewed the page twice. Out-of-the-box, both use a first party cookie ID set on your own domain, so it’s hard to imagine in what circumstance the Snowplow cookie would be deleted but the GA cookie not. How are you identifying users in GA? Is it based on only first party cookie IDs? Or are you pushing in your own user-level identifiers?
One more thing I would like to add.
We are working with multiple domains.
Domain X - UV difference coming around 35% as mentioned above
Domain Y - UV difference around 3%
We have done same set up for both the domains
Interesting! So it’s a issue that’s isolated to a specific domain.
Are you doing any cross domain tracking with GA? If so, that have some impact.
On the domain with the issue: are you tracking across subdomains? (E.g. blog.mysite.com, www.mysite.com, app.mysite.com etc.)? If so - what domain have you set with the Snowplow JS tracker? With the Google JS tracker? (Have you set an explicit domain?)
Just trying to think of reasons why a cookie would reset for Snowplow but not for GA…
We do not do cross domain tracking with GA. Both domains have exclusive users from 2 different countries.
Also we are not doing tracking across subdomains.
That’s really odd @vivek291836. Can you share what the domain is? Without looking at it it’s very hard to know in what situation The Snowplow first party cookie would be deleted but not the GA first party cookie… It’s not something we’ve seen before and in your case the issue is domain-specific, so we need to look at the domain for an explanation…
I shared our domains with you on message. Please let us know if you find anything useful which can help us solve GA vs Snowplow issue
Hi @vivek291836 - what about enabling the gaCookies context in the JS tracker for the domain in question? This should enable you to identify exactly in the data your Snowplow cookie ID (i.e. the
domain_userid field updates, but the corresponding GA cookie, stored in
com_google_analytics_cookies_1._ga column, don’t.
If you’d like a member of our team to investigate the domain in question I can send you the details of a support contract?
All the best,
@yali Thanks for your valuable response.
We have enabled gaCookies context and will be monitoring
com_google_analytics_cookies_1._ga column once data start coming.
Also we are in talk with management on talking help from Snowplow support, so it will be useful if you can share details as well.
Great stuff Vivek - let us know what you find when comparing the cookie IDs!
All the best,
So, we did analysis on ga cookie IDs for data we got yesterday and what we found is:
- when domain_userid changes, _ga changes as well which is a good thing
- But, there are many cases where _ ga is null but we are getting domain_userid so that makes distinct domain_userid != distinct _ga.
What are your thoughts about this ? We will also continue doing some more analysis on ga cookie id.