Snowplow JSON validator failed to validate faulty property_names in array

Hi team,

We had a type of event that should fail the JSON validation at the Stream Enrich Step, however, stream enrich fail to recognize it as the malformed event.

The schema looks like this:

{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "description": "Schema for customer",
  "self": {
    "vendor": "au.com.some.vendor",
    "name": "customer",
    "format": "jsonschema",
    "version": "2-0-1"
  },
  "type": "object",
  "properties": {
    "customer_type": {
      "type": "array",
      "items": {
        "type": "string",
        "enum": ["residential", "unknown","unknown_verified"]
      }
    },
    "total_claimed_profiles": {
      "type": "integer"
    },
    "about_me": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "type_of_work": {
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        }
      }
    },
    "agents": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "agency_id": {
            "type": "string"
          },
          "agent_profile_id": {
            "type": "string"
          },
          "permissions": {
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        }
      }
    }
  },
  "required": ["customer_type"],
  "additionalProperties": false
}

The event payload looks like this:

{
   "schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0",
   "data":[
      {
         "schema":"iglu:au.com.some.vendor/customer/jsonschema/2-0-1",
         "data":{
            "customer_type":[
               "unkown"
            ],
            "about_me":[
               {
                  "type_of_work":[
                     
                  ]
               }
            ],
            "total_claimed_profiles":0,
            "agents":[
               {
                  "agency_id":"BWXGOX",
                  "agent_profile_id":"1BDF564F",
                  "permission":[
                     "basic",
                     "rent_application_management"
                  ]
               }
            ]
         }
      }
   ]
}

Noticed that in the schema, a field called permissions while in the event payload, it called permission. JSON validator didn’t pick up this error.

Similarly, I change the field from agency_id to agency_ids in the event payload, and send the event to snowplow collector. It pass the JSON validator too.

We are using snowplow stream enrich v1.3.2

I’ve gone over the source code but could not figure out why. Any suggestions would be graceful.

Cheers

The behaviour you are seeing is expected - that instance of data should validate against that schema. If you want it to fail against that schema you’ll need to ensure that additionalProperties is specified as false for your agents array e.g.,

{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "description": "Schema for customer",
  "self": {
    "vendor": "au.com.some.vendor",
    "name": "customer",
    "format": "jsonschema",
    "version": "2-0-1"
  },
  "type": "object",
  "properties": {
    "customer_type": {
      "type": "array",
      "items": {
        "type": "string",
        "enum": ["residential", "unknown","unknown_verified"]
      }
    },
    "total_claimed_profiles": {
      "type": "integer"
    },
    "about_me": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "type_of_work": {
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        }
      }
    },
    "agents": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "agency_id": {
            "type": "string"
          },
          "agent_profile_id": {
            "type": "string"
          },
          "permissions": {
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        },
        "additionalProperties": false
      }
    }
  },
  "required": ["customer_type"],
  "additionalProperties": false
}

which will then result in a bad row:

Property ‘permission’ has not been defined and the schema does not allow additional properties.

I see. Thanks for your reply Mike. Did a test and it works.

Can I ask about how does the payload data being validated with the schema? From Iglu-scala-client, the validate() method has no implementation ( maybe just because I didn’t find it). Link to validate() definition: iglu-scala-client/Validator.scala at 379c433048a9899fb55e8f9151f3c85b91cc927b · snowplow/iglu-scala-client · GitHub

Have a look at the check function in the Iglu Scala client. This method also performs resolution and validation of the schema itself as well as the data against the schema - but I think that’s what you are after.

Yes, I notice this function, but at line 44, it calls validator.validate(instance.data, schema). I can’t seem to see what the validate() logic is. e.g. It checks for additionalProperties

Snowplow uses a third party library (networknt/json-schema-validator) for the validation logic itself - so the code you’ll find will mostly be calls out to the library rather than the parsing / validation logic itself (as validating JSON schemas is itself quite complicated).

I suspect you’ll find the logic you’re looking for in the third party library here, as part of the AdditionalPropertiesValidator.

1 Like