Self-describing event jsonschema defaults don't populate

Hi, I am instrumenting Self-Describing (unstruct) events based on a rigid taxonomy where we want to capture categories and sub-categories for events, but not expose those as properties for the developers to have to specify in the tracking call.

For example, the event “Make Payment Failed” would be tracked as the unstruct event payments__make_payment_failed with category: "Billing", subcategory: "Payments", and property failure_reason.

We want failure_reason passed as a property in the tracker call, but we don’t want developers to pass in category and subcategory because of the potential for error.

Our attempt to do this was to utilize jsonschema default in the Self-Describing event’s schema. Upon firing some test events, however, we discovered that the default values are not getting populated by the validator in the snowplow pipeline.

Is this not a functionality that has been implemented?
Does anyone know of a way that we can get the functionality we want?

Thanks!

Hi @mariah.rogers,

This is indeed a known limitation - Iglu system doesn’t support default JSON Schema parameter. The main reason it’s still not implemented is that there are multiple possible ways how this can be implemented.

  1. As you mentioned, substitute it on enrichment step. But then it means enrichment is mutating user data. Can downstream consumers consider it required then as its always present?
  2. Why mutation can happen on enrichment side? Maybe it should be done on loading step - many DB support DEFAULT option in DDL, so maybe we can use it whenever possible and substitute manually for DBs that don’t support it.
  3. Technically, no implementation is also a valid implementation as specification doesn’t say (as far as I know - I can be wrong, but too lazy to check) default should be substituted, instead it’s just a pointer for a user that they must be aware of some special value.

So, I guess long story short - we couldn’t make the right decision and decided just to push it back. However, since you brought it again, now I think the best implementation would be to make it configurable enrichment, i.e. have an enrichment that is when enabled substitutes default with respective values for all schemas (or to some pre-defined subset). Anyways, I created a ticket to consider:

Thanks for raising, let us know what do you think of the proposal.

1 Like

What’s the cause of the potential source of error here? I would have thought it would be easier to add this in here.

As @anton has mentioned above - the implementation of this is tricky as some databases (BigQuery, Snowflake) don’t support adding a default value (at insert time) and if it’s being added at enrichment time it has to mutate the event which may break a consumer contract.

@mike Thanks! I figured those might be some reasons why.

Our potential for error is actually the chance of developers taking liberties with the event calls. We developed a taxonomy for events across 5+ products, and each team instruments tracking themselves. In the past teams have tended to stray from our docs/best practices, especially in cases where product owners are familiar with the taxonomy but the devs themselves aren’t, and just start implementing events based on jira tickets without delving into the tracking plan and taxonomy docs.

@anton Yes, everywhere I saw online about jsonschema say that no implementation is a valid implementation :sweat_smile:

That proposal sounds great! Looking forward to it.