Stream enrich is trying to validate SelfDescribingJson instead of the specified one


#1

I have recently been trying to add some extra validation to my pipeline using Iglu and unstruc_events through to the stream enricher.

It seems that the iglu client within the stream enricher is trying to regex validate the self-describing json schema instead of the one I have specified, yet when I try to send the event through my tracker without the SelfDescribingJson() it fails the test that it is a SelfDescribingJson, what should I do?

below:

  • enrich error
  • tracking code
  • intended schema
  • resolver.json

<-------------- ENRICH ERROR EXAMPLE --------------->
{
“line”: “{lots of base 64 line}”,
“errors”: [
{
“level”: “error”,
“message”: "error: ECMA 262 regex “^iglu:[a-zA-Z0-9-.]+/[a-zA-Z0-9-]+/[a-zA-Z0-9-_]+/[0-9]±[0-9]±[0-9]+\" does not match input string \"http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0\"\n level: \"error\"\n schema: {\"loadingURI\":\"#\",\"pointer\":\"/properties/schema\"}\n instance: {\"pointer\":\"/schema\"}\n domain: \"validation\"\n keyword: \"pattern\"\n regex: \"^iglu:[a-zA-Z0-9-_.]+/[a-zA-Z0-9-_]+/[a-zA-Z0-9-_]+/[0-9]+-[0-9]+-[0-9]+”\n string: "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0"\n"
},
{
“level”: “error”,
“message”: “Unstructured event couldn’t be extracted”
}
],
“failure_tstamp”: “2016-11-15T12:53:07.633Z”
}
<-------------- END ENRICH ERROR EXAMPLE --------------->

This is my python code to send the event

<-------------- TRACKER CODE --------------->
s = Subject()
t.subject.set_platform(platform).set_user_id(uid).set_lang(“enc”).set_ip_address(ip)

event = SelfDescribingJson(schema=“iglu:com.busuu/standard_event/jsonschema/1-0-1”,
data={
“event”: {event_name},
“uid”: {uid},
“language_learnt”: {language_learnt},
“interface_language”: {interface_language},
“params”: {custom_context},
“platform”: {platform},
“app_id”: {app_id},
“version”: {version},
“environment”: {environment},
“user_agent”: {user_agent}})

t.track_unstruct_event(event)

<-------------- END TRACKER CODE --------------->

this is the schema that I am trying to validate against

<-------------- SCHEMA VALIDATOR CODE --------------->
{
"$schema": “http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#”,
“description”: "Schema for the busuu ",
“self”: {
“vendor”: “com.busuu”,
“name”: “standard_event”,
“format”: “jsonschema”,
“version”: “1-0-0”
},

    "type": "object",
    "properties": {
            "event": {
                    "type": "string",
                    "maxLength": 255
            },
            "uid": {
                    "type": "string",
                    "maxLength": 255
            },
            "ts": {
                    "type": "string",
                    "maxLength": 255
            },
            "language_learnt": {
                    "type": "string",
                    "maxLength": 255
            },
            "interface_language": {
                    "type": "string",
                    "maxLength": 255
            },
            "params": {
                    "type": "string",
                    "maxLength": 500
            },
            "platform": {
                    "type": "string",
                    "maxLength": 255
            },
            "app_id": {
                    "type": "string",
                    "maxLength": 255
            },
            "version": {
                    "type": "string",
                    "maxLength": 255
            },
            "environment": {
                    "type": "string",
                    "maxLength": 255
            },
            "user_agent": {
                    "type": "string",
                    "maxLength": 255
            }
    },
    "additionalProperties": false

}
<-------------- END SCHEMA VALIDATOR CODE --------------->

and finally my resolver.json

<-------------- RESOLVER CODE --------------->

{
“schema”: “iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1”,
“data”: {
“cacheSize”: 500,
“repositories”: [
{
“name”: “Iglu Central”,
“priority”: 0,
“vendorPrefixes”: [ “com.snowplowanalytics” ],
“connection”: {
“http”: {
“uri”: “http://iglucentral.com
}
}
},
{
“name”: “busuu Iglu Repo”,
“priority”: 5,
“vendorPrefixes”: [ “com.busuu” ],
“connection”: {
“http”: {
“uri”: “{ip of my resolver}”
}
}
}
]
}
}
<-------------- END RESOLVER CODE --------------->


#2

Hi @brucey31 - this is very odd:

I feel like somewhere in your code you must have self-describing JSONs with:

{
  "schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "data": {
  }
}

However I fully concede that this problem isn’t present in the code you shared.


#3

Thanks for such a quick reply Alex,

You are definitely right that the iglu central schema is being used unnecessarily.
When I decode the raw data line of the event straight out of the collector and before it hits the enricher I get:

{“data”: {“data”: {“language_learnt”: “{language_learnt}”, “platform”: “mob”, “version”: “{version}”, “params”: “{‘source’: ‘{source}’, ‘term’: ‘{term}’, ‘group’: ‘{group}’, ‘email’: ‘{email}’, ‘campaign’: ‘{campaign}’}”, “uid”: “{uid}”, “user_agent”: “{user_agent}”, “environment”: “{environment}”, “interface_language”: “{interface_language}”, “event”: “{event}”, “app_id”: “{app_id}”}, “schema”: “iglu:com.busuu/standard_event/jsonschema/1-0-1”}, “schema”: “iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0”}

This looks pretty good to me? The only place that I mention “http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#” in my code is at the top of my custom schema see SCHEMA VALIDATOR CODE.

Is there something wrong in my resolver.json that points the validator to the wrong place?


#4

Hey @brucey31:


#5

I am using whatever the python tracker uses.

I changed the iglu central schamas and it worked!

Thanks for your help!


#6

Very odd!