In the general case, the question of differences between GA and Snowplow is fairly difficult to answer. On the high level it usually just comes down to ‘there are differences in logic between the two products’.
Fundamentally (and still on a very high level), the difference in logic generally boils down to a difference in philosophy - leaving aside the sampling problem, the GA approach is to remove the need for the user to reason about the ‘raw’ data (at least to some extent), and to carry out at least some aggregation or logic ‘under the hood’. The advantage to that is that it makes the data more accessible, but the down side is that the user has no access to the logic/decisions made about the data under the hood - and so if they want to handle it differently they either can’t, or they need to be pretty creative.
With Snowplow, our approach is to focus on collection, and leave all of the business logic to the user. A colleague recently phrased this idea as ‘we just throw it over the wall and let them worry about the rest’. No decisions are made for you.
You’ve presented some interesting findings here, so hopefully I can prod you towards finding a better understanding of these differences. Quite likely this difference in approach is at play - where we’re throwing stuff over the wall ‘as is’ GA may be amending, aggregating, or otherwise handling the data before it’s presented to you (obviously it’s not possible to determine exactly what GA is doing).
Additionally, we have configurable sessionisation - you can specify in the tracker what the session timeout is.
So, I would explore a few avenues to learn more about this:
Are there Snowplow sessions with only one event? Perhaps there are users who return to a tab much later, trigger an event by opening the tab, close it and leave. If GA disregard these, or attribute them to existing sessions, then (depending on how the tracker is instrumented) you may see some single-event sessions in Snowplow. Our philosophy is that it’s best to let you decide to filter those out if you choose to.
Is there a difference in the tracking configuration? If sessions in the Snowplow tracker are configured differently to GA’s sessionisation then obviously we can see a difference (this is fairly likely), but also if GA is configured to trigger events at times that Snowplow isn’t, then it could be the case that GA is attributing one session where Snowplow attributes two. Think of someone watching a 1hr video - if GA is firing events during that hour but Snowplow isn’t, GA will consider it a single session where Snowplow will consider it two.
I hope that’s helpful stuff - and I’m glad to hear you don’t want to reconcile them, that kind of exercise is normally a massive time sink for no real value - you’re right to aim towards understanding the difference and reasoning about that in how you come to conclusions.