Filtering events from specific IPs

#1

Hey,

I was wondering if there is a way to filter events from specific IPs BEFORE they are loaded into the target.

Before GDPR we could filter these IPs in the queries, but now we are using the IP anonymization enrichment.

Would love to hear what you think.

Thanks.

0 Likes

#2

@moshesh, you can use JavaScript script enrichment for that purpose. It comes before PII enrichment and hence the IPs are still visible to the enrichment component. You can use data from IP lookup enrichment which produces values such as geo_country to filter based on the country origin of the events.

Do remember that raw data (if batch pipeline used) would still contain the IP address in S3. You might want to set up life cycle rules to get the data flushed after a reasonable number of days.

0 Likes

#3

Thanks @ihor!

The javascript enrichment sounds like a perfect solution.

0 Likes

#4

For a less permanent or retroactive application, you could try this script:

If you know your Salt, IP addresses and push it through the same cryptographic hash function, you will get the same output as the Snowplow enrichment emits.

Then you can just filter by hashed user_ipaddress in SQL.

0 Likes

#5

Thanks @robkingston, but this is not what I’m looking for.

OK, so my original question was regarding filtering events.
So the javascript enrichment is great for that, but we’ve decided that it’s won’t be good to filter the events and we prefer to mark them in some way.

Let’s say that we want to use a field that is not being used by us, e.g. ip_organization.
According to the example in the javascript ennrichment documentation, I should return an array of objects with schema and data

Blockquote
return [ { schema: “iglu:com.acme/derived_app_id/jsonschema/1-0-0”,
data: { appIdUpper: appIdUpper } } ];

Every attempt I made in making it worked failed.

What schema should I use for this?
How should the data json object look like?

Any help would be highly appreciated.

0 Likes

#6

@moshesh, if I understood your goal correctly you do not want to filter events by IP/ geo_country but rather mark such an event using the field ip_organization for the purpose.

It can be achieved by mutating the field. No custom JSON schema (aka returned contexts) is required for that. Here’s how you could do that

function process(event) {
    if (event.getUser_ipaddress() ...) {   // set your condition
      event.setIp_organization(new String('TO BE FILTERED')); // mark it the way you want
    }
}
0 Likes

#7

Many thanks @ihor, this was really helpful.
I didn’t know that the event is mutable.

We don’t have the PII enrichment, but the IP anonymization enrichment.
At start it didn’t work because of this enrichment, but once I removed it everything was as expected.
I added the IP anonymization logic to my JS script.

Thanks :slight_smile:

0 Likes

#8

@moshesh, glad it worked out for you. Snowplow is a very flexible analytics platform with lots of “hidden” gems.

In case you are curious, here’s the order of enrichments execution: https://github.com/snowplow/snowplow/blob/master/3-enrich/scala-common-enrich/src/main/scala/com.snowplowanalytics.snowplow.enrich/common/enrichments/EnrichmentRegistry.scala#L139-L177. Indeed, IP anonymization precedes Javascript enrichment.

0 Likes