Snowplow Stream Enrichment

Hi All,

I am setting an enrichment via stream enrichment by downloading the jar file as instructed on below link

Now , to configure it we need it’s configuration file along with another configuration file which is JSON configuration for the Iglu resolver used to look up JSON schemas.(see below link)

A sample JSON file is also given at below path

https://raw.githubusercontent.com/snowplow/snowplow/master/3-enrich/config/iglu_resolver.json
.
.
.
.
.
Now i got stuck with this sample json file ,because there are many different parameters are configured inside it and i don’t know what all are they indicates for ,also what would be the values i should configure for all those parameters to make it run…?

Can anyone please help…?

@miteshu,

Those configuration files serve different purposes; they are not alternatives.

  • config.hocon: configures Stream Enrich itself. It’s specific to your pipeline implementation; make changes to reflect your architecture.
  • Iglu resolver provides the reference to Iglu server - repository of various JSON schemas used to validate various events you track. If you do not use a custom event no need to make any amendments to it.

Both configuration files need to be passed over as parameters as described in Run Stream Enrich.

Hello @ihor ,

Thanks a lot for your heads up.

Yes…i know these 2 configuration files serve different purposes and they are not alternatives, though i am using both the files and passing over as parameters as well.

My only question is how should we configure iglu resolver configuration , i mean what values should we set inside it.

Since, i am not able to relate the parameters inside iglu resolver config file with their exact purpose and because of that don’t know what values should i use for those parameters.

Do you have any sample iglu resolver config file , so that i can see the configuration inside that…?

@miteshu, as I said If you do not use a custom event there is no need to make any amendments to the resolver configuration file. In other words, the content remains the same

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      },
      {
        "name": "Iglu Central - GCP Mirror",
        "priority": 1,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://mirror01.iglucentral.com"
          }
        }
      }
    ]
  }
}

Those are Snowplow public repositories. You are welcome to use them.

More info on Iglu repositories are here: https://github.com/snowplow/snowplow/wiki/Iglu-registry

1 Like

@ihor

Thanks again.

Just would like to update you that i also have done the same. As you mentioned i kept the file as it is without any changes and executing below comnand

$ java -jar snowplow-stream-enrich-0.12.0.jar --config Stream-enrich.conf --resolver file:resolver.json

After running above command i get below error

stream-enrich

Could you please suggest me, if i am doing wrong somewhere…because error shows the content of iglu resolver file itself…?

@miteshu, the error states “invalid JSON”. You might have had a hidden (invisible) character(s) introduced by your editor. Could you try linting the content of the configuration file? There are a few online linting tools out there. One of them: http://zaa.ch/jsonlint/.

@ihor

Thanks.

Initially i also thought the same and because of that i already validated the JSON on
https://jsonformatter.curiousconcept.com/ and json is seems to be fine.

Below is the resolver JSON which i am using, which have been validated.

Though i already have checked it no of times, but do you see any other area’s , that we might have missed mistakenly.

@ihor what changes must I make to the Iglu file if I want to track custom events?

@Alexandre_Rayes, for a start, you need your own Iglu server. The easiest way is to set up a static server. In practice, it means having a dedicated S3 bucket with enabled HTTP access to it: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/static-website-hosting.html.

The JSON schema would have to be uploaded into the dedicated location (paths) to be recognized/found. If, say, your JSON schema has self section as below

{
    "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
    "description": "Schema for my_event",
    "self": {
        "vendor": "com.acme",
        "name": "my_event",
        "format": "jsonschema",
        "version": "1-0-0"
    },
   . . .

then the file named 1-0-0 (version) needs to be uploaded to /schemas/com.acme/my_event/jsonschema/ location of the Iglu bucket.

More on self-describing JSON schema see here: https://github.com/snowplow/iglu/wiki/Self-describing-JSON-Schemas.

Once your own Iglu server is up and running, you need to add it to the resolver configuration in the same fashion as the default Iglu server (Iglu Central):

{
  "name": "My Iglu server",
  "priority": 0,
  "vendorPrefixes": [
    "com.acme"
  ],
  "connection": {
    "http": {
      "uri": "http://<URI-to-static-website>"
    }
  }
}