Facing isssue while using ip_lookups


#1

{
“line”: “CwBkAAAADjE5Mi4xNjguNTUuMjExCgDIAAABZGTNw+ILANIAAAAFVVRGLTgLANwAAAAQc3NjLTAuMTMuMC1rYWZrYQsBLAAAAHJNb3ppbGxhLzUuMCAoV2luZG93cyBOVCAxMC4wOyBXaW42NDsgeDY0KSBBcHBsZVdlYktpdC81MzcuMzYgKEtIVE1MLCBsaWtlIEdlY2tvKSBDaHJvbWUvNjcuMC4zMzk2Ljk5IFNhZmFyaS81MzcuMzYLATYAAAAmaHR0cDovL2xvY2FsaG9zdDo4MDgwL3NhbXBsZS9oZWxsby5qc3ALAUAAAAACL2kLAUoAAAHwc3RtPTE1MzA2OTkzOTE1NzEmZT1wdiZ1cmw9aHR0cCUzQSUyRiUyRmxvY2FsaG9zdCUzQTgwODAlMkZzYW1wbGUlMkZoZWxsby5qc3AmcGFnZT1TYW1wbGUlMjBBcHBsaWNhdGlvbiUyMEpTUCUyMFBhZ2UmdHY9anMtMi42LjImdG5hPWNmMiZhaWQ9NjliOGU0MzQ3Mzc0OTgwMCZwPXdlYiZ0ej1Bc2lhJTJGS29sa2F0YSZsYW5nPWVuLVVTJmNzPXdpbmRvd3MtMTI1MiZmX3BkZj0wJmZfcXQ9MCZmX3JlYWxwPTAmZl93bWE9MCZmX2Rpcj0wJmZfZmxhPTAmZl9qYXZhPTAmZl9nZWFycz0wJmZfYWc9MCZyZXM9MTM2Nng3NjgmY2Q9MjQmY29va2llPTEmZWlkPTgyOTM2MDI5LWM3YTEtNDU4Ny05M2ViLTUzYTMwYjEzMzkxOSZkdG09MTUzMDY5OTM5MTU3MCZ2cD03Njd4NjEzJmRzPTc2N3g2MTMmdmlkPTIwJnNpZD01ZjcyZjg2ZC00YmY0LTQxYzctOGQxMi1iYTA0ZjI4MmZjNGEmZHVpZD1jODQ4NWNhZS04OGZlLTQ4ZDktOGRlOS03MjI0YjljNzkyNzYmZnA9Mjc4NzEyNDY0MQ8BXgsAAAAIAAAAGUhvc3Q6IDE5Mi4xNjguNTUuMjI5OjEyMzQAAAAWQ29ubmVjdGlvbjoga2VlcC1hbGl2ZQAAAH5Vc2VyLUFnZW50OiBNb3ppbGxhLzUuMCAoV2luZG93cyBOVCAxMC4wOyBXaW42NDsgeDY0KSBBcHBsZVdlYktpdC81MzcuMzYgKEtIVE1MLCBsaWtlIEdlY2tvKSBDaHJvbWUvNjcuMC4zMzk2Ljk5IFNhZmFyaS81MzcuMzYAAAAyQWNjZXB0OiBpbWFnZS93ZWJwLCBpbWFnZS9hcG5nLCBpbWFnZS8qLCAqLyo7cT0wLjgAAAAvUmVmZXJlcjogaHR0cDovL2xvY2FsaG9zdDo4MDgwL3NhbXBsZS9oZWxsby5qc3AAAAAeQWNjZXB0LUVuY29kaW5nOiBnemlwLCBkZWZsYXRlAAAAIEFjY2VwdC1MYW5ndWFnZTogZW4tVVMsIGVuO3E9MC45AAAAG1RpbWVvdXQtQWNjZXNzOiA8ZnVuY3Rpb24xPgsBkAAAAA4xOTIuMTY4LjU1LjIyOQsBmgAAACRiYThmZWUzMi0xNjAzLTRiZjctYTVjOC0xZjE2MjQyOTRiZmILemkAAABBaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3Muc25vd3Bsb3cvQ29sbGVjdG9yUGF5bG9hZC90aHJpZnQvMS0wLTAA”,
“errors”: [
{
“level”: “error”,
“message”: “Unexpected error processing events: com.maxmind.db.InvalidDatabaseException: Could not find a MaxMind DB metadata marker in this file (ip_geo). Is this a valid MaxMind DB file?\n\tat com.maxmind.db.Reader.findMetadataStart(Reader.java:278)\n\tat com.maxmind.db.Reader.(Reader.java:129)\n\tat com.maxmind.db.Reader.(Reader.java:116)\n\tat com.maxmind.geoip2.DatabaseReader.(DatabaseReader.java:66)\n\tat com.maxmind.geoip2.DatabaseReader.(DatabaseReader.java:54)\n\tat com.maxmind.geoip2.DatabaseReader$Builder.build(DatabaseReader.java:160)\n\tat com.snowplowanalytics.maxmind.iplookups.IpLookups$$anonfun$getService$1.apply(IpLookups.scala:111)\n\tat com.snowplowanalytics.maxmind.iplookups.IpLookups$$anonfun$getService$1.apply(IpLookups.scala:106)\n\tat scala.Option.map(Option.scala:146)\n\tat com.snowplowanalytics.maxmind.iplookups.IpLookups.getService(IpLookups.scala:106)\n\tat com.snowplowanalytics.maxmind.iplookups.IpLookups.(IpLookups.scala:91)\n\tat com.snowplowanalytics.maxmind.iplookups.IpLookups$.apply(IpLookups.scala:46)\n\tat com.snowplowanalytics.snowplow.enrich.common.enrichments.registry.IpLookupsEnrichment.ipLookups$lzycompute(IpLookupsEnrichment.scala:167)\n\tat com.snowplowanalytics.snowplow.enrich.common.enrichments.registry.IpLookupsEnrichment.ipLookups(IpLookupsEnrichment.scala:165)\n\tat com.snowplowanalytics.snowplow.enrich.common.enrichments.registry.IpLookupsEnrichment.extractIpInformation(IpLookupsEnrichment.scala:178)\n\tat com.snowplowanalytics.snowplow.enrich.common.enrichments.EnrichmentManager$$anonfun$4$$anonfun$apply$2.apply(EnrichmentManager.scala:242)\n\tat com.snowplowanalytics.snowplow.enrich.common.enrichments.EnrichmentManager$$anonfun$4$$anonfun$apply$2.apply(EnrichmentManager.scala:240)\n\tat scala.Option.map(Option.scala:146)\n\tat com.snowplowanalytics.snowplow.enrich.common.enrichments.EnrichmentManager$$anonfun$4.apply(EnrichmentManager.scala:240)\n\tat com.snowplowanalytics.snowplow.enrich.common.enrichments.EnrichmentManager$$anonfun$4.apply(EnrichmentManager.scala:239)\n\tat scala.Option.flatMap(Option.scala:171)\n\tat com.snowplowanalytics.snowplow.enrich.common.enrichments.EnrichmentManager$.enrichEvent(EnrichmentManager.scala:239)\n\tat com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3.apply(EtlPipeline.scala:92)\n\tat com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3.apply(EtlPipeline.scala:91)\n\tat scalaz.NonEmptyList$class.map(NonEmptyList.scala:23)\n\tat scalaz.NonEmptyListFunctions$$anon$4.map(NonEmptyList.scala:207)\n\tat com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(EtlPipeline.scala:91)\n\tat com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(EtlPipeline.scala:88)\n\tat scalaz.Validation$class.map(Validation.scala:112)\n\tat scalaz.Success.map(Validation.scala:345)\n\tat com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1.apply(EtlPipeline.scala:88)\n\tat com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1.apply(EtlPipeline.scala:85)\n\tat scala.Option.map(Option.scala:146)\n\tat com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1.apply(EtlPipeline.scala:85)\n\tat com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1.apply(EtlPipeline.scala:82)\n\tat scalaz.Validation$class.map(Validation.scala:112)\n\tat scalaz.Success.map(Validation.scala:345)\n\tat com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$.processEvents(EtlPipeline.scala:82)\n\tat com.snowplowanalytics.snowplow.enrich.stream.sources.Source.enrichEvents(Source.scala:136)\n\tat com.snowplowanalytics.snowplow.enrich.stream.sources.Source$$anonfun$5.apply(Source.scala:161)\n\tat com.snowplowanalytics.snowplow.enrich.stream.sources.Source$$anonfun$5.apply(Source.scala:161)\n\tat scala.collection.immutable.List.flatMap(List.scala:338)\n\tat com.snowplowanalytics.snowplow.enrich.stream.sources.Source.enrichAndStoreEvents(Source.scala:161)\n\tat com.snowplowanalytics.snowplow.enrich.stream.sources.KafkaSource.run(KafkaSource.scala:90)\n\tat com.snowplowanalytics.snowplow.enrich.stream.Enrich$class.run(Enrich.scala:73)\n\tat com.snowplowanalytics.snowplow.enrich.stream.KafkaEnrich$.run(KafkaEnrich.scala:37)\n\tat com.snowplowanalytics.snowplow.enrich.stream.KafkaEnrich$.main(KafkaEnrich.scala:39)\n\tat com.snowplowanalytics.snowplow.enrich.stream.KafkaEnrich.main(KafkaEnrich.scala)\n”
}
],
“failure_tstamp”: “2018-07-04T10:19:25.085Z”
}

I am getting this error when I am using ip_lookup enrichment,Using latest ip_lookup schema iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0


#3

Did you find a solution for this?


#4

On further investigation, I found that our network was having trouble downloading the file from url - http://snowplow-hosted-assets.s3.amazonaws.com/third-party/maxmind/GeoLite2-City.mmdb

Possibly due to cloudfront blocking some traffic on certain URLs. I tried a bunch of different networks in Sydney, Australia, but I get very very slow speeds all across (<10kbps .) before the download dies.

I am trying to find a location where I am able to download this 54Mb file speedily, and report back if the issue is resolved.


#5

It might also help if there are mirrors we could use of the hosted assets.


#6

I downloaded the file from maxmind website and hosted the file on my public s3 bucket, this seems to resolve the issue.

It was by chance that I figured out the cause of this issue. It could also have processed events enriching them as bad, and not reporting issues in production event pipeline, resulting in massive reruns to recover events.

It is weird that there are no errors thrown by the enricher if it’s not able to download the asset. I think we should capture errors and display them upfront.
The right behaviour would be to fail and refuse to start the enricher.


#8

@arihantsurana / @vinayakfutak, the issue is likely to be related to cross-region download. The URI http://snowplow-hosted-assets.s3.amazonaws.com/third-party/maxmind always resolves to the eu-west-1 bucket, which would explain why you have problems downloading it (we’ve come across this issue when a pipeline was set up in ap-southeast-2).

We’ll be addressing this shortcoming in a future release by either specializing the enrichment file locations to the proper region or build a clever proxy which will resolve to the proper region.