External filtering in Snowplow javascript trackers

achintya.c · January 17, 2022, 12:00pm

Hi

This is more of a query whether my idea can be utilized in Snowplow trackers. My idea is that, I would like to filter the data using some external API so that snowplow collect only specific type of data. I tried to send data to my AWS collector , through another third party URL but it is not working.

My query is , is my idea valid ? Will snowplow allow such filtering?

Remember, I am only trying this for web trackers.

mike · January 17, 2022, 8:48pm

Can you give an example of what sort of filtering logic you are trying to employ? Some filtering may require an external service vs some filtering that may be more appropriate at either the tracker or load balancer level.

achintya.c · January 25, 2022, 7:56am

My filtering is external. Let me explain.

My collector is ‘c1.example.com’
My filtering service is running at ‘filter.example.com’
3.I am sending tracking data to ‘filter.example.com’ and in this filtering server, I redirecting all the hits towards my collector url(c1.example.com).

Now the issue is that, I am facing 404 errors.

achintya.c · February 3, 2022, 8:50am

Hi

Is there any documentation available for the file sp.js? We can try modifying it to allow different filtering using the same file.

mike · February 3, 2022, 10:06am

The documentation for the Javascript tracker is here, depending on what filtering you want to do we may be able to advice the best approach.

achintya.c · February 3, 2022, 11:23am

Hi

I have gone through javascript tracker. Now I want to do following filtering

Stop users tracking data from certain countries.
2.Stop few texts which not complying with our reporting system such as this : ‘ợc+hợp+nhất|’. These texts breaks the reporting files such as CSV, TSV
Blocking some kind of contents for example porn content etc.

Thanks for your time and effort.

mike · February 3, 2022, 9:01pm

Although you can do this in Javascript (by calling an API to lookup the possible country for an IP address) this is probably easier to do at the load balancer / CDN level by serving an empty Javascript file / tracker. Avoid blocking the Snowplow requests directly as this will cause them to queue up in the users local storage.

Do you know where this data is coming from? Everything in the Snowplow pipeline should be UTF-8 so this shouldn’t break loading or processing any parts of the pipeline but you may want to filter it out somewhere depending on how it’s being sent / if it’s expected (you could likely due this in your schema definitions).

Is it porn URLs that are being sent through in fields or something else? This can be a trickier one to remove but I suspect the best place to do this would be a custom enrichment that flags adult sites and removes / redacts or drops the event depending on your desired behaviour.

achintya.c · February 7, 2022, 2:43pm

Thanks for your suggestions.

achintya.c · February 9, 2022, 3:36pm

Can we use IP lookup enrichment for blocking some countries/cities? If not, what is the utilization of this enrichment?

mike · February 9, 2022, 8:47pm

The IP lookup enrichment runs after collection so its primary use is to add geographic information to an event for analysis and filtering - rather than blocking.

If you want to stop events before they are collected this depends a bit more on the use case e.g., do you want to stop events because you don’t have consent to collect or you just don’t want to collect for some other reason?

Depending on this you could look at blocking countries at the CDN level (though a warning that this still be an approximation based on the IP address) or alternately you could run some client side code that retrieves the country from an IP address using an API (such as ipify) and then determines whether the tracker should be initialised or not.

achintya.c · February 21, 2022, 8:11am

I suppose you mean at CDN level , that is at application level? Do you think that , we can filter at snowplow end by adding some custom enrichment using javascript?

mike · February 21, 2022, 9:59am

You could filter in the enrichment process but you may be better off with using the API enrichment rather than the Javascript enrichment so you can easily change out resources + databases as they get updated.

achintya.c · February 21, 2022, 10:41am

Can you please explain , how this API enrichment can be implemented?

Thanks again.

achintya.c · February 22, 2022, 8:59am

I suppose , you referring to this as API enrichment:

mike · February 22, 2022, 8:19pm

Yes - you would send the parts of the event that you want to filter to this API and then you could flag the events appropriately and remove them from your database.

Topic		Replies	Views
Running Snowplow in Minimal Mode for GDPR GDPR	18	3758	January 22, 2021
Snowplow JavaScript Tracker 2.17.0 - with cookieless tracking New releases	1	1284	January 6, 2021
Javascript tracker not hitting AWS load balancer	21	1935	June 7, 2022
Set up Javascript Browser Tracker with next js For engineers	5	1731	March 1, 2022
Collect data from existing java-script trackers in Snowplow For engineers	1	528	April 23, 2020

External filtering in Snowplow javascript trackers

Related Topics