Configurable enrichments with Stream Enrich?


#1

Hi all,

Does configurable enrichment works with stream enrich or do I need to have emr-etl-runner? I have been trying ip_lookup and user utils enrichments with stream enrich but without much luck. Please help.


#2

Hi @manju,

Yes - Stream Enrich supports all of Snowplow’s configurable enrichments! Can you share how you are trying to start up Stream Enrich with the enrichments, maybe there’s a misstep there?

Alex


#3

Hi @alex,

Thanks for getting back. I really hope you could help with this.

I am starting stream enrich like it is done in snowplow-mini (init scripts):

./snowplow-stream-enrich-0.8.1 --config ~/snowplow/enrichconfig.hocon --resolver file:/home/ubuntu/snowplow/iglu_resolver.json --enrichments file:/home/ubuntu/snowplow/enrichments/

Here are the ip lookups and user config enrichments in enrichment folder:

{
    "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
    "data": {

        "name": "ip_lookups",
        "vendor": "com.snowplowanalytics.snowplow",
        "enabled": true,
        "parameters": {
            "geo": {
                "database": "GeoLiteCity.dat",
                "uri": "https://s3-us-west-2.amazonaws.com/****/GeoLiteCity.dat"
            }
        }
    }
}

iglu resolver.json

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-0",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      }
    ]
  }
}

#4

For the IP lookups enrichment, follow exactly the simpler example configuration on this page and it should work fine:


#5

Hi @alex,

Thanks again so much for getting back. But I have tried exactly the same configuration without luck. I also do not see ip_lookups schema in iglucentral.com. So, should I add another repository in iglu_resolver.json? I have been trying this for a very long time. Any help is highly appreciated. Thanks so much.


#6

Hi @manju would you be able to share any logs from the stream enrich application starting up?


#7

Hi @josh,

Thank you very much for responding. I have similar kind of set up like snowplow-mini. Do not see any logs or I have not enabled it. Nothing shows up in stderr-pipe though. If you can let me know how to enable it, will do that. Sorry, am quite new to this.


#8

What I am after is anything coming from stdout or stderr. Please run this command and paste whatever appears in the generated log file back to this thread:

./snowplow-stream-enrich-0.8.1 --config ~/snowplow/enrichconfig.hocon --resolver file:/home/ubuntu/snowplow/iglu_resolver.json --enrichments file:/home/ubuntu/snowplow/enrichments/ >> stream_enrich_log_file.txt 2>&1

#9

Hi @josh,

Thank you so much again! After enabling the logs like you said, I figured the enrichments folder was not getting considered at all. It somehow missed my notice. I corrected it and now it is working. This discourse was definitely helpful in getting me to this level. I didn’t know how to set this up enrichments initially. So I really appreciate your’s and @alex’s help. Thank you so much for helping the newbies. Please keep it up.


#10

Hi @manju glad you got it sorted!

For when you get to the point of scaling up your stream enrich instances you might want to look at moving the enrichment json’s and your iglu resolver into DynamoDB instead. This means you won’t need to manually copy these files to each server that is running and ensures you have one source of truth - much easier to debug!

See this page for help: https://github.com/snowplow/snowplow/wiki/Run-Stream-Enrich#running