Snowplow and GA are reporting different visitor numbers

That will only get rid of a very limited set of bots and spiders - see

http://discourse.snowplow.io/t/excluding-bots-from-queries-in-redshift-tutorial/127

for details.

There is a list (which costs costs a few thousand dollars to buy) that vendors like GA and Adobe can use to filter bots which I’d expect to be more extensive than the list included in the user agent parsing libraries bundled. I’d also expect Google to have proprietary tech for spotting bots. So it’s possible that these bots account for the difference: I’d still do the check by page URL. If the discrepancy is constant across pages (or bigger for pages that are more likely to be crawled) that would suggest bots account for the discrepancy. If the discrepancy is skewed for particular page URLs, that suggests an implementation issue.