[solved] Iglu-server does not validate my self-describing json against my self-describing json schema

Hi there,

I am trying to setup a custom context for tracking events related to A/B tests. In order to do this, I created the following self-describing json schema (company name changed to acme.com)

{
	"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
	"description": "Variation of a test on acme.com",
	"self": {
		"vendor": "com.acme",
		"name": "testvariation",
		"format": "jsonschema",
		"version": "1-0-0"
	},

	"type": "object",

	"properties": {

		"testId": {
			"description": "Identifier of the test",
			"type": "integer",
			"minimum": 0,
			"maximum": 2147483647
		},

		"variationId": {
			"description": "Identifier of the variation of the test",
			"type": "integer",
			"minimum": 0,
			"maximum": 2147483647
		}
	},

	"required": ["testId", "variationId"],
	"additionalProperties": false
}

I could verify that this is a valid self-describing json schema using

./igluctl lint schemas/com.acme/testvariation/jsonschema/1-0-0

My intent with this schema was to validate self-describing json like this one (as described in the documentation)

{
    "schema": "iglu:com.acme/testvariation/jsonschema/1-0-0",
    "data": {
        "testId": 1234,
        "variationId": 5678
    }
}

I wanted to verify that it works, so I have setup an iglu-server instance, where I uploaded the json-schema.

curl -X POST 'http://192.168.99.100:30704/api/schemas/com.acme/testvariation/jsonschema/1-0-0?isPublic=true' -H 'apikey: my-write-key' -d @schemas/com.acme/testvariation/jsonschema/1-0-0 
{
  "status" : 201,
  "message" : "Schema successfully added",
  "location" : "/api/schemas/com.acme/testvariation/jsonschema/1-0-0"
}

Now when I try to validate my self-describing json, I get this:

curl http://192.168.99.100:30704/api/schemas/validate/com.acme/testvariation/jsonschema/1-0-0 -X POST -F 'instance={"schema": "iglu:com.acme/testvariation/jsonschema/1-0-0", "data": {"testId": 1234, "variationId": 5678}}}'          
{
  "status" : 400,
  "message" : "The instance provided is not valid against the schema",
  "report" : {
    "level" : "error",
    "schema" : {
      "loadingURI" : "#",
      "pointer" : ""
    },
    "instance" : {
      "pointer" : ""
    },
    "domain" : "validation",
    "keyword" : "additionalProperties",
    "message" : "object instance has properties which are not allowed by the schema: [\"data\",\"schema\"]",
    "unwanted" : [ "data", "schema" ]
  }
}

However, if I try to validate against the embedded payload directly, I obtain this:

curl http://192.168.99.100:30704/api/schemas/validate/com.acme/testvariation/jsonschema/1-0-0 -X POST -F 'instance={"testId": 1234, "variationId": 5678}'
{                                                       
  "status" : 200,
  "message" : "The instance provided is valid against the schema"
}

I am utterly confused as to why this is happening. Why is my self-describing json not validating?

The documentation of custom contexts specifies that on the frontend side, the javascript should send the custom contexts as self describing json. Why is only the payload validating, then?

@Christophe-Marie_Duq, from the sound of it you seem to have submitted data for validation incorrectly. The error says

    "message" : "object instance has properties which are not allowed by the schema: [\"data\",\"schema\"]",
    "unwanted" : [ "data", "schema" ]

You have already specified the schema in your URI, curl http://192.168.99.100:30704/api/schemas/validate/com.acme/testvariation/jsonschema/1-0-0. I suspect that your data should simply be {"testId": 1234, "variationId": 5678} when validating (I haven’t used curl for this purpose myself to be sure).

From what you are saying, I am assuming that when an iglu client is validating a self describing json against the matching self describing json schema, it looks at the value associated with the "schema"key, resolves the schema, and then checks if this schema validates the value associated with the "data" key.

You are saying that since we already specify which schema to use in the request url (after /api/schemas/validate), there is no need to apply this resolve mechanism here. However I think it would be less confusing if one could feed the full self describing json to the validation service (and not just the value associated with "data") and let this validation service resolve the schema and validate the data the same way an iglu client would do it.

With the current approach, if I made a typo in the "schema" value, I won’t see it.

Hi @Christophe-Marie_Duq,

If you’re looking for a less confusing way of testing custom events & schemas, take a look at Snowplow Mini.

You can just send data using one of the trackers, or use the Snowplow Tracking CLI.

Best,

@Christophe-Marie_Duq, there are 2 validations methods available:

  • Validates that a schema is self-describing
  • Validates an instance against its schema

Here’s the wiki: GitHub - snowplow/iglu: Iglu is a machine-readable, open-source schema repository for JSON Schema from the team at Snowplow.

The first validates the schema itself. The second validates data against the corresponding schema.

Thus, if you meant to validate the schema your command would have to be like

curl \
  HOST/api/schemas/validate/jsonschema \
  -X POST
  -F 'schema={ "schema": "to be validated" }'

My first comment was related to the 2nd validation option.

If you made a typo you would refer to non-existant schema. Again, these are self-describing events we are talking about. There’s no dedicated endpoint to send each custom schema to. The collector’s endpoints are predefined and you would typically send an event to /i (GET) or /com.snowplowanalytics.snowplow/tp2 (POST) endpoint no matter what custom event the event belongs to. The collector will take them all. It is the enrichment process that checks the events against the JSON schema the event itself tell it to check against. The enrichment process has no means of knowing which JSON schema you meant to use if you mistyped it.

As @Colm pointed out, to ensure your tracking has been set up correctly, use Snowplow Mini as your testing environment before deploying the tracking code to production.

And, yes, you will see an error (mistyped schema for example) as your event will be filtered out and end up in either bad index (Elasticsearch) or bad bucket or both depending on your pipeline implementation. The bad event will contain an error message.

1 Like

Thank you for your support!