Sampling SnowPlow hits


#1

I was trying to find more documentation on this and would appreciate assistance.
We have 2 trackers (PROD and STG) that send data to S3 and then we use Stream Enrich on that data.
We are looking to scale down data collection on STG tracker to 10% to reduce processing and storage cost.
What is the best way to approach it?


#2

Hi @mbondarenko,

I would add staging tracker only to 10% of users. Simple and efficient. Moreover, limited resources to introduce feature.


#3

How would you do that? I was initial looking to set it similarly to Google Analytics but do not see any parameters that would do that. Any guidance or link to documentation is appreciated.


#4

I will use some custom JS to get 10% hit rate. But was wondering if there is a built-in feature on either tracker or enricher side that can work in the same way as it does with Google Analytics.


#5

Not yet though there is an issue to track this on Github. For the moment if you’re only using the JS tracker the best way would be to extend the tracker to enable sampling.


#6

I would get the simplest possible approach:

window.snowplow('newTracker', 'PROD', '{{PROD-COLLECTOR-URI}}', {
        appId: '{{MY-SITE-ID}}'
});

if (Math.random() > 0.9 ) {
    window.snowplow('newTracker', 'STG', '{{STG-COLLECTOR-URI}}', {
        appId: '{{MY-SITE-ID}}'
    });
}

#7

Thank you!