IP lookup ArrayIndexOutOfBoundsException

Hi there,

I’m having issues to use the ip lookup enrichment. I already configure the enrichment json file:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
  "data": {
    "name": "ip_lookups",
    "vendor": "com.snowplowanalytics.snowplow",
    "enabled": true,
    "parameters": {
      "geo": {
        "database": "GeoIPCity.dat",
        "uri": "http://my-endpoint.s3.amazonaws.com/third-party/maxmind"
      },
      "isp": {
        "database": "GeoIPISP.dat",
        "uri": "http://my-endpoint.s3.amazonaws.com/third-party/maxmind"
      },
      "organization": {
        "database": "GeoIPOrg.dat",
        "uri": "http://my-endpoint.s3.amazonaws.com/third-party/maxmind"
      }
    }
  }
}

But it does not work, the error:

{
  "line": "xxx",
  "errors": [
    {
      "level": "error",
      "message": "Could not extract geo-location from IP address [179.74.112.99]: [java.lang.ArrayIndexOutOfBoundsException]"
    }
  ],
  "failure_tstamp": "2016-12-20T14:31:03.962Z"
}

Also I changed the content type of the *.dat files to binary/octet-stream, but the error still happening.

The error begins when I add the isp and/or organization. It works when only geo is enabled.

The files are ok, I already validate them using the GeoIP gem.

What is wrong?

Thanks very much

Hi Thiago,

Hi @alex

Yes, my files are public. How can I host them privately and use with snowplow?

Use a “s3://” path on a bucket that is not publically viewable (but is accessible by the user running EMR).

I changed to “s3://” and removed the “everybody” read permission, but only the “geo” works. ISP or Organization the same error:

Could not extract geo-location from IP address [179.74.112.99]: [java.lang.ArrayIndexOutOfBoundsException]

Can anyone help me?

Can anyone help me?

Can you re-post your full updated configuration?

@alex sure

{
  "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
  "data": {
    "name": "ip_lookups",
    "vendor": "com.snowplowanalytics.snowplow",
    "enabled": false,
    "parameters": {
      "geo": {
        "database": "GeoIPCity.dat",
        "uri": "s3://myendpoint.s3.amazonaws.com/third-party/maxmind"
      },
      "isp": {
        "database": "GeoIPISP.dat",
        "uri": "s3://myendpoint.s3.amazonaws.com/third-party/maxmind"
      },
      "organization": {
        "database": "GeoIPOrg.dat",
        "uri": "s3://myendpoint.s3.amazonaws.com/third-party/maxmind"
      }
    }
  }
}

Hi @thiagogsr - just use the correct S3 bucket names:

"uri": "s3://mybucket/third-party/maxmind"

Hi @alex, thanks your attention, but it didn’t work yet.

"errors": [
    {
        "level": "error",
        "message": "Could not extract geo-location from IP address [189.9.13.93]: [java.lang.ArrayIndexOutOfBoundsException]"
    }
],

My configuration file:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
  "data": {
    "name": "ip_lookups",
    "vendor": "com.snowplowanalytics.snowplow",
    "enabled": true,
    "parameters": {
      "geo": {
        "database": "GeoIPCity.dat",
        "uri": "s3://mybucketname/third-party/maxmind"
      },
      "isp": {
        "database": "GeoIPISP.dat",
        "uri": "s3://mybucketname/third-party/maxmind"
      },
      "organization": {
        "database": "GeoIPOrg.dat",
        "uri": "s3://mybucketname/third-party/maxmind"
      }
    }
  }
}

Using aws CLI can you list the bucket contents please:

$ aws s3 ls s3://mybucketname/third-party/maxmind/

Thanks!

Hi @alex,

Here the output

2016-12-19 18:24:14          0
2016-12-20 11:28:28   47721533 GeoIPCity.dat
2016-12-20 11:28:36    4189407 GeoIPISP.dat
2016-12-20 11:28:44   20307977 GeoIPOrg.dat
2016-12-20 11:33:10   17760694 GeoLiteCity.dat

I’m using the version 0.9.0.

@alex is there a way to fix it?

I’m not sure what’s wrong with your setup @thiagogsr - it still feels like a permissions problem to me.

Are you definitely using the same creds in Stream Enrich and at the command line with aws CLI?

It is strange because the Geo dat file is in the same path of ISP and Organization dat files and it works. I will check again, thanks.

Having the same issue here using scala stream enrich,

"Could not extract geo-location from IP address [xx.xx.xx.xx]: [java.lang.ArrayIndexOutOfBoundsException: 15796200

{
    "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",

    "data": {

        "name": "ip_lookups",
        "vendor": "com.snowplowanalytics.snowplow",
        "enabled": true,
        "parameters": {
            "geo": {
                "database": "GeoLiteCity.dat",
                "uri": "s3://xxxxxxxxx-snow-plow-assets/third-party/maxmind"
            }
        }
    }
}

I can list from CLI:

2017-01-25 13:30:29 0
2017-01-25 14:04:41 17775436 GeoLiteCity.dat

The user have permissions on S3 however in the IAM Management console in the access advisor tab for that user there seems to be no activity on S3:

Not accessed in the tracking period

Which is weird, I am using stream enrich 0.10.0 any ideas would be welcome.

Thanks,

Nir

I have tried a lot of trouble shooting and finally what helped is downloading the db via

yum install GeoIP GeoIP-data

Then putting the ip_geo file in the enrichment folder, that made the stream enrich to download the file, after that the process ran correctly and successfully finish the job.

I would assume some sort of cache is causing this.

I have opened up a ticket:

https://github.com/snowplow/snowplow/issues/3083

It is working now after download the updated files in maxmind dashboard.

  • GeoIP-111_20170103

  • GeoIP-121_20170103

  • GeoIP-133_20170117

      {
        "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
        "data": {
          "name": "ip_lookups",
          "vendor": "com.snowplowanalytics.snowplow",
          "enabled": true,
          "parameters": {
            "geo": {
              "database": "GeoIPCity.dat",
              "uri": "s3:///mybucket/third-party/maxmind"
            },
            "isp": {
              "database": "GeoIPISP.dat",
              "uri": "s3:///mybucket/third-party/maxmind"
            },
            "organization": {
              "database": "GeoIPOrg.dat",
              "uri": "s3:///mybucket/third-party/maxmind"
            }
          }
        }
      }
1 Like