I’ve updated a schema from 1-0-0 to 1-0-1 and realized that this sent millions of events into enriched_bad. One example would be Googlebot page_views, where we get around 30.000 - 100.000 per day.
I setup the new schema to have minimalProperties set to 1, which seems to be the problem here, as the Googlebot uses some kind of cache and doesn’t send data to the 1-0-1 but still to the 1-0-0 schema. Even 5 days after my schema update most of the Googlebot events are only having 1-0-0 schema values.
There are also user events in between and Facebook inApp browser users, as FB seems to have some weird caching as well (We had numerous technical problems with it already).
I wonder how I could circumvent this for future schema updates? If I try to enforce strict schemas with set minProperties for example I will always end up with a lot of events in the bad bucket as far as I understand?
Is there some logic that could for example check in the enrichment process if there is data in 1-0-1 and if not check 1-0-0 before sending it to the bad bucket? I can see new problems coming up with that though. To name one would be how to handle null values in the new schema, if it doesn’t allow nulls in the database fields…