Enrichments, how to enable in quickstart examples?

Hello,

I’ll write this topic, following my tries searching for informations about how to properly enable an enrichment, If someone already created a similar post, so sorry and appreciate if someone point me to the link as I can’t find the right answer myself so far.

I’m running the quickstart project on AWS, almost everything is working fine, I can collect and query my events and so.

Then I started to explore more and decided to start testing the ip_lookups enrichment. Following the Docs, I found the basic about what is enrichments, the available enrichements but when I stared to read and follow the guidances to setup the IP Lookups enrichment I got some problems, in the part 3:

3. Configure the enrichment for your pipeline

Snowplow BDP customers can enable the IP Lookup enrichment for your pipeline in the Snowplow console. Open Source will need to upload the enrichment json for use in their Snowplow pipeline.

I understand they made a console for the BDP, but for the open source we need to upload the enrichment file (I already have the json file with the url for my self hosted maxmind file, following the example) but I can’t find where we need to upload that file. It’s vague for me, in my pipeline process I can’t find where, and in my iglu-server, I already pushed the schema files, right? So the only thing missing is this file to be upload? Where? How? There’s a guide I’m missing to read?

Maybe I’m missing some part of the Docs, or I don’t understood how to upload that file using the igluctl.

Also, I noticed the very bottom link that shows the Output maybe is wrong, it appears copied from other enrichment:

Output

This enrichment adds a new context to the enriched event with this schema.

Thank you so much! :slight_smile:
PR

Hey guys, I found more information:

I missed that part (my bad) on my quickstart journey:

And also, this:

I edited my terraform file, added the enrichment part for the ip_lookups, and applied the changes, but now I stoped receiving events at all. applying the changes impact in something I need to restart or adjust? I can see the tracker posting, but it doesn’t appear in my postresql anymore. Any help appreciated.

It’s likely that something went wrong with the enrichment, or the enrichment has caused the data to fail validation. Either way, the data would land in failed events - if you check that there should be error messages to help debug.

I edited my terraform file, added the enrichment part for the ip_lookups, and applied the changes, but now I stoped receiving events at all.

That worked for us! Maybe post what you entered as enrichment json?

Hey guys, thanks for the replies!

Here is the code I’m running on my terraform, following the guide posted above.

locals {
  enrichment_ip_lookups = jsonencode(<<EOF
{
    "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
    "data": {
        "name": "ip_lookups",
        "vendor": "com.snowplowanalytics.snowplow",
        "enabled": true,
        "parameters": {
            "geo": {
                "database": "GeoLite2-City.mmdb",
                "uri": "s3://path-to-my-s3-assets/third-party/maxmind"
            }
        }
    }
}
EOF
  )
}

and:

  # Enable this enrichment
  enrichment_ip_lookups = local.enrichment_ip_lookups

After that I applied with terraform apply. Things updated and then I stoped to received events.

The bucket or mmdb file needs some special permission? It’s under my assets bucket, private.

I’ll look now about the failed events stream / bad buckets to check if I have more details.

Hey guys, just to let you and others know how I’ve solved my issues (And receive feedback if I did something outside the patterns).

I didn’t found any info inside bad buckets and tables, and also I didn’t saw anything wrong inside the streams, so I was blind to check what was happening, the events simple didn’t appears into the table.

I tried uploading my maxmind file to a new private bucket but changed the permission to be public accessible and the events started to be enriched successfully. I’m not sure if this is the correct, but while my file was inside a private bucket it didn’t worked. Maybe some miss-configuration on my side?

The docs only says to upload the file inside a private bucket, and as I know my permissions for all the terraform configs would be enough to access the file inside the enrichment server. No? Maybe the only missing part is to set the database file to be public?

Thank you! and please, if my solution is wrong, appreciate some guidance to make it better.

You shouldn’t need to have this in a public bucket.

I’d double check:

  • simulating the role / policy that you have setup on the instance that is fetching the asset to ensure the permissions are correct
  • see if there are any errors or warnings raised on the instance on initialisation if it cannot fetch from S3
  • failing that - look at Cloudtrail logs (for S3 specifically) and have a look if you can see the API call itself that is likely failing

Hi @prss you likely need to use this option:

This allows you to add the bucket you are hosting the databases in to the IAM policy for the role. Hope this helps!

Thanks @josh and @mike. I’ve tested using the option input_custom_s3_hosted_assets_bucket_name and it worked successfully :slight_smile:

many kudos!

2 Likes

Glad to hear it! If you have time we would welcome a PR into the README that would have made using that setting more obvious to you than it was to save future users pain.

1 Like