IP lookup ArrayIndexOutOfBoundsException


#1

Hi there,

I’m having issues to use the ip lookup enrichment. I already configure the enrichment json file:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
  "data": {
    "name": "ip_lookups",
    "vendor": "com.snowplowanalytics.snowplow",
    "enabled": true,
    "parameters": {
      "geo": {
        "database": "GeoIPCity.dat",
        "uri": "http://my-endpoint.s3.amazonaws.com/third-party/maxmind"
      },
      "isp": {
        "database": "GeoIPISP.dat",
        "uri": "http://my-endpoint.s3.amazonaws.com/third-party/maxmind"
      },
      "organization": {
        "database": "GeoIPOrg.dat",
        "uri": "http://my-endpoint.s3.amazonaws.com/third-party/maxmind"
      }
    }
  }
}

But it does not work, the error:

{
  "line": "xxx",
  "errors": [
    {
      "level": "error",
      "message": "Could not extract geo-location from IP address [179.74.112.99]: [java.lang.ArrayIndexOutOfBoundsException]"
    }
  ],
  "failure_tstamp": "2016-12-20T14:31:03.962Z"
}

Also I changed the content type of the *.dat files to binary/octet-stream, but the error still happening.

The error begins when I add the isp and/or organization. It works when only geo is enabled.

The files are ok, I already validate them using the GeoIP gem.

What is wrong?

Thanks very much


#2

Hi Thiago,


#3

Hi @alex

Yes, my files are public. How can I host them privately and use with snowplow?


#4

Use a “s3://” path on a bucket that is not publically viewable (but is accessible by the user running EMR).


#5

I changed to “s3://” and removed the “everybody” read permission, but only the “geo” works. ISP or Organization the same error:

Could not extract geo-location from IP address [179.74.112.99]: [java.lang.ArrayIndexOutOfBoundsException]

#7

Can anyone help me?


#8

Can anyone help me?


#9

Can you re-post your full updated configuration?


#10

@alex sure

{
  "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
  "data": {
    "name": "ip_lookups",
    "vendor": "com.snowplowanalytics.snowplow",
    "enabled": false,
    "parameters": {
      "geo": {
        "database": "GeoIPCity.dat",
        "uri": "s3://myendpoint.s3.amazonaws.com/third-party/maxmind"
      },
      "isp": {
        "database": "GeoIPISP.dat",
        "uri": "s3://myendpoint.s3.amazonaws.com/third-party/maxmind"
      },
      "organization": {
        "database": "GeoIPOrg.dat",
        "uri": "s3://myendpoint.s3.amazonaws.com/third-party/maxmind"
      }
    }
  }
}

#11

Hi @thiagogsr - just use the correct S3 bucket names:

"uri": "s3://mybucket/third-party/maxmind"

#12

Hi @alex, thanks your attention, but it didn’t work yet.

"errors": [
    {
        "level": "error",
        "message": "Could not extract geo-location from IP address [189.9.13.93]: [java.lang.ArrayIndexOutOfBoundsException]"
    }
],

My configuration file:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
  "data": {
    "name": "ip_lookups",
    "vendor": "com.snowplowanalytics.snowplow",
    "enabled": true,
    "parameters": {
      "geo": {
        "database": "GeoIPCity.dat",
        "uri": "s3://mybucketname/third-party/maxmind"
      },
      "isp": {
        "database": "GeoIPISP.dat",
        "uri": "s3://mybucketname/third-party/maxmind"
      },
      "organization": {
        "database": "GeoIPOrg.dat",
        "uri": "s3://mybucketname/third-party/maxmind"
      }
    }
  }
}

#13

Using aws CLI can you list the bucket contents please:

$ aws s3 ls s3://mybucketname/third-party/maxmind/

Thanks!


#14

Hi @alex,

Here the output

2016-12-19 18:24:14          0
2016-12-20 11:28:28   47721533 GeoIPCity.dat
2016-12-20 11:28:36    4189407 GeoIPISP.dat
2016-12-20 11:28:44   20307977 GeoIPOrg.dat
2016-12-20 11:33:10   17760694 GeoLiteCity.dat

#15

I’m using the version 0.9.0.


#16

@alex is there a way to fix it?


#17

I’m not sure what’s wrong with your setup @thiagogsr - it still feels like a permissions problem to me.

Are you definitely using the same creds in Stream Enrich and at the command line with aws CLI?


#18

It is strange because the Geo dat file is in the same path of ISP and Organization dat files and it works. I will check again, thanks.


#19

Having the same issue here using scala stream enrich,

"Could not extract geo-location from IP address [xx.xx.xx.xx]: [java.lang.ArrayIndexOutOfBoundsException: 15796200

{
    "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",

    "data": {

        "name": "ip_lookups",
        "vendor": "com.snowplowanalytics.snowplow",
        "enabled": true,
        "parameters": {
            "geo": {
                "database": "GeoLiteCity.dat",
                "uri": "s3://xxxxxxxxx-snow-plow-assets/third-party/maxmind"
            }
        }
    }
}

I can list from CLI:

2017-01-25 13:30:29 0
2017-01-25 14:04:41 17775436 GeoLiteCity.dat

The user have permissions on S3 however in the IAM Management console in the access advisor tab for that user there seems to be no activity on S3:

Not accessed in the tracking period

Which is weird, I am using stream enrich 0.10.0 any ideas would be welcome.

Thanks,

Nir


#20

I have tried a lot of trouble shooting and finally what helped is downloading the db via

yum install GeoIP GeoIP-data

Then putting the ip_geo file in the enrichment folder, that made the stream enrich to download the file, after that the process ran correctly and successfully finish the job.

I would assume some sort of cache is causing this.

I have opened up a ticket:

https://github.com/snowplow/snowplow/issues/3083


#21

It is working now after download the updated files in maxmind dashboard.

  • GeoIP-111_20170103

  • GeoIP-121_20170103

  • GeoIP-133_20170117

      {
        "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
        "data": {
          "name": "ip_lookups",
          "vendor": "com.snowplowanalytics.snowplow",
          "enabled": true,
          "parameters": {
            "geo": {
              "database": "GeoIPCity.dat",
              "uri": "s3:///mybucket/third-party/maxmind"
            },
            "isp": {
              "database": "GeoIPISP.dat",
              "uri": "s3:///mybucket/third-party/maxmind"
            },
            "organization": {
              "database": "GeoIPOrg.dat",
              "uri": "s3:///mybucket/third-party/maxmind"
            }
          }
        }
      }