No longer able to find custom contexts after update


#1

Greetings!

I recently upgraded my snowplow setup in its entirety (version bump of about 30 releases). After having done so I see errors pop up in my enricher error stream[1]. They occur systematically (happens in every batch write), but not for all events (approximately 10 in a batch of 300). They indicate that my custom enrichments cannot be found anymore, but I have not changed anything in my resolver.conf[2] or in my enrichment itself and I have not been able to find any mentioning of any changes in syntax for either.

Any help and guidance is greatly appreciated

[1] Error from enricher error eventstream

 "errors": [
        {
            "level": "error",
            "message": "error: Could not find schema with key iglu:io.ontame/application_step/jsonschema/1-0-0 in any repository, tried:\n    level: \"error\"\n    repositories: [\"Iglu Central [HTTP]\",\"Iglu Client Embedded [embedded]\",\"Ontame Custom [HTTP]\"]\n"
        }

[2] resolver.conf

{
    "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-0",
    "data": {
        "cacheSize": 500,
        "repositories": [
            {
                "name": "Iglu Central",
                "priority": 0,
                "vendorPrefixes": [
                    "com.snowplowanalytics"
                ],
                "connection": {
                    "http": {
                        "uri": "http://iglucentral.com"
                    }
                }
            },
            {
                "name": "Ontame Custom",
                "priority": 0,
                "vendorPrefixes": [
                    "io.ontame"
                ],
                "connection": {
                    "http": {
                        "uri": "http://ontame-schemas.s3-website-eu-west-1.amazonaws.com/"
                    }
                }
            }
        ]
    }
}

Kind regards
Christoffer


Event fingerprint enrichment not running
#2

Hi @cstpdk,

Right - a lot has changed in 30 releases so we need to dig into what your expected behavior is, so we can figure out what’s changed.

What you’ve shared with us in [1] suggests a missing schema, which is odd because the referenced schema is available:

http://ontame-schemas.s3-website-eu-west-1.amazonaws.com/schemas/io.ontame/application_step/jsonschema/1-0-0

The thing I wanted to check was why you said:

Did you mean custom contexts, not custom enrichments? That would make more sense.

Can I ask:

  • What version did you upgrade from
  • What version are you running now
  • How many nodes running enrich
  • How many custom schemas do you have - are the 10 failed events always looking for the same “missing” schema, or a mix?

#3

Hi @alex,

Thank you so much for getting back to me.

Yes, I assume I mean context, sorry I am a little overwhelmed by the terminology at times. The folder is called enrichments, maybe that is what confuses me.

On to your questions:

1 and 2) I upgraded the enricher from 0.5.0 to 0.8.0. I should probably also mention that it is the scala / kinesis enricher
3) One node running enrich
4) There is only that one custom schema

Let me know if there is any other info I can provide, and once again thanks for helping out


#4

Thanks - I wanted to ask about this:

Can you define what you mean by a batch? The strange thing is that schemas should be cached by Stream Enrich, and so once a schema has been found and cached, it should be available to all subsequent events.


#5

Hi again alex,

Thanks again for helping

You write:

And that was also my understanding from looking at the source. But I’m not sure about the logic for loading custom contexts though, can it be that it is only used on some events (e.g. only unstructured events?)

Regarding batch, maybe if I give you my log output it is clear what I mean:

[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource - Processing 1 records from shardId-000000000000
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource - Processing 12 records from shardId-000000000000
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource - Processing 4 records from shardId-000000000000
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource - Processing 1 records from shardId-000000000000
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource - Processing 4 records from shardId-000000000000
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource - Processing 2 records from shardId-000000000000
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource - Processing 10 records from shardId-000000000000
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Current stream shard assignments: shardId-000000000000
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Sleeping ...
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource - Processing 3 records from shardId-000000000000
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource - Processing 3 records from shardId-000000000000
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource - Processing 5 records from shardId-000000000000
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sinks.KinesisSink - Writing 246 records to Kinesis stream events-enriched
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sinks.KinesisSink - Writing 15 records to Kinesis stream events-enriched-error
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sinks.KinesisSink - Successfully wrote 246 out of 246 records
[pool-2-thread-1] INFO com.snowplowanalytics.snowplow.enrich.kinesis.sinks.KinesisSink - Successfully wrote 15 out of 15 records

My ynderstanding is that it keeps enriched records in memory until it reaches some threshold after which it writes to kinesis. It is what is being written to kinesis I denote as a “batch”. Am I correct in this?

Thanks again!