Problem on downloading maxmind data from S3 using Scala Stream Enrich 0.9.0

juanstiza · October 26, 2016, 1:25pm

Hi all, first of all, this is a copy of an issue I created on Github.
Stream Enrich is failing to download the Maxmind data sets from s3.
I’ve setup my ip_lookups.json as per this wiki page like this:

{
    "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
    "data": {
        "name": "ip_lookups",
        "vendor": "com.snowplowanalytics.snowplow",
        "enabled": true,
        "parameters": {
            "geo": {
                "database": "GeoIPCity.dat",
                "uri": "s3://my-private-bucket.s3.amazonaws.com"
            },
            "isp": {
                "database": "GeoIPISP.dat",
                "uri": "s3://my-private-bucket.s3.amazonaws.com"
            },
            "organization": {
                "database": "GeoIPOrg.dat",
                "uri": "s3://my-private-bucket.s3.amazonaws.com"
            }
        }
    }
}

And Stream Enrich throws this error:

[main] ERROR com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$ - Error downloading s3:/my-private-bucket.s3.amazonaws.com/GeoIPCity.dat: java.lang.IllegalArgumentException: The bucket name parameter must be specified when requesting an object
Exception in thread "main" java.lang.RuntimeException: Attempt to download s3:/my-private-bucket.s3.amazonaws.com/GeoIPCity.dat to ./ip_geo failed
    at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$$anonfun$12.apply(KinesisEnrichApp.scala:172)
    at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$$anonfun$12.apply(KinesisEnrichApp.scala:154)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$delayedInit$body.apply(KinesisEnrichApp.scala:154)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
    at scala.App$class.main(App.scala:71)
    at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$.main(KinesisEnrichApp.scala:71)
    at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp.main(KinesisEnrichApp.scala)

I was told to fix the urls here like this:

            "geo": {
                "database": "GeoIPCity.dat",
                "uri": "s3://my-private-bucket"
            },
            "isp": {
                "database": "GeoIPISP.dat",
                "uri": "s3://my-private-bucket"
            },
            "organization": {
                "database": "GeoIPOrg.dat",
                "uri": "s3://my-private-bucket"
            }

And the process is returning the same error. Any pointers?
Note: I’ve used EC2 roles (on AWS) to grant that instance full access to s3.

Thanks!

josh · October 26, 2016, 1:36pm

Hi @juanstiza, I have re-opened your Github ticket. There is a regression error in how we parse s3 URIs. I have posted a workaround in the ticket here.

Could you please give that a try and let us know if that resolves the issue for the moment?

juanstiza · October 26, 2016, 2:08pm

Still no go with auth… the EC2 instance has an IAM role attached with the AmazonS3FullAccess managed policy, I used the aws cli to download the files with no issues (which might be the solution right now), does the app reads the access keys from the metadata? should I attach an IAM user to the s3 uri?

josh · October 27, 2016, 8:31am

Hi @juanstiza, thats really odd. The application creates an S3 Client based on the credentials supplied by its config. If these are set as iam in the config then it should use the same credentials as the aws cli - assuming you ran aws cli from the same instance?

Could you try putting in access and secret keys into the stream enrich config to see if it will then be able to download?

juanstiza · October 27, 2016, 11:22am

@josh As we are testing or own storage process, we are using pipes and we did not set the configuration for AWS on the file as Kinesis wasn’t needed. Still, I find it strange that the Client is not able to use the permissions granted by the IAM role as this doc states (point 4).
Anyhow, we might just create access keys and use them. I’ll test that later today.
Thanks!

josh · October 27, 2016, 11:33am

It can use IAM, ENV or hardcoded values. In your config for stream enrich you will need to tell the application which ones to look for in the chain.

juanstiza · October 27, 2016, 11:48am

Sorry, now I understand (too early here heh). I’ll test today, but most likely will work. Thanks again!

Topic		Replies	Views
IP lookup enrichment IndexOutOfBound exception Enrichment	1	1225	October 25, 2016
Stream Enrichment 0.18.0 ip_lookups not working Enrichment	0	1009	November 19, 2018
Scala Stream Collector + Strem Enrich + S3 Loader Setup AWS real-time pipeline	6	3466	December 5, 2017
Payload with vendor cgi-bin and version index.cgi not supported by this version of Scala Common Enrich Enrichment	2	1718	April 24, 2018
Kinesis stream enrich failing AWS real-time pipeline	5	3078	October 8, 2016

Problem on downloading maxmind data from S3 using Scala Stream Enrich 0.9.0

Related Topics