How to validate the Snowplow IP lookups enrichment is working


#1

One of the enrichments Snowplow offers is IP lookups enrichment using the MaxMind database.

Sometimes you might see that not all values of the IP lookups enrichment are being populated. One reason this could be happening is that the IP address does not have all information in the MaxMind database.

You can check too see what information the MaxMind database has and compare that with the data Snowplow gives you. This can help you determine if the information is not known (in which case the enrichment is working) or if there is an issue with the enrichment itself.

You can install GeoIP like this:

$ sudo apt-get install geoip-bin

Check that it is working:

$ example_ip="xxx.xxx.xxx.xxx"
$ geoiplookup ${example_ip}

(replace xxx.xxx.xxx.xxx with the IP address you want to check).

This should show you the results of the GeoIP Country Edition (it only returns the country code and the country name).

Testing the Snowplow provided free version

If you are using the free version provided by Snowplow you can download a copy using AWS CLI like this:

$ aws s3 cp s3://snowplow-hosted-assets/third-party/maxmind/GeoLiteCity.dat .

If you don’t want to use AWS cli you can also download it with wget:

$ wget http://snowplow-hosted-assets.s3.amazonaws.com/third-party/maxmind/GeoLiteCity.dat

You can then check if the information in the database matches with the results from Snowplow:

$ geoiplookup -f GeoLiteCity.dat ${example_ip}

Here is an example using one of Google’s public DNS servers (8.8.8.8):

$ geoiplookup -f GeoLiteCity.dat 8.8.8.8 

The results should look like this:

GeoIP City Edition, Rev 1: US, CA, California, Mountain View, 94035, 37.386002, -122.083801, 807, 650

Testing the MaxMind subscription version

If you have a MaxMind subscription you can download a copy of your database with the AWS CLI like this:

$ aws s3 cp s3://location-of-your-assets-bucket-maxmind/GeoIPCity.dat .

You can check the information in the same way:

$ geoiplookup -f GeoLiteCity.dat ${example_ip}

#2

Hello!

Leon, thank you for your explanation.
I found that about 30% of the events do not have geo_city.
For some of those events though, I am able to find a city by using MaxMInd’s Python API, which means that those IP addresses are in the database but somehow, the enrichment seems to be unable to look them up.

Any idea of why this is happening?

Thanks!

Boris