Why is the Redshift table definition for a schema not the latest version?


#1

A partner recently asked us:

We are passing mobile_context in iOS, shouldn’t we have networkType and networkTechnology in schema 1-0-1? Currently its not even a column in the table. Schema in the table does say 1-0-1.

The updated Redshift table definition and JSON Paths files for mobile_context version 1-0-1 are being “held back” in the Blocked schemas milestone currently. That milestone contains various other contexts and events which are lagging the latest schema published in Iglu Central.

These migrations are being held back because releasing them would break existing Snowplow installations. If we upgrade the JSON Paths file in s3://snowplow-hosted-assets to contain all the 1-0-1 fields, then this upgraded version will conflict at load time with the existing tables deployed in users’ Redshift databases. Snowplow users’ pipelines will fail to load Redshift due to no fault of their own.

This is due to an architectural mistake we introduced with our original shredding technology. The medium term fix is to implement the full table migration capabilities that we are working towards in Iglu and Snowplow. In the short-term there is an easy workaround:

  1. Deploy the 1-0-1 table version into your Redshift database
  2. Add the 1-0-1 JSON Paths file into your own jsonpath_assets: S3 bucket at the appropriate path

Because of the way that the StorageLoader works, your local copy of the 1-0-1 JSON Paths file will take priority over the centrally hosted 1-0-0 version in s3://snowplow-hosted-assets.


#2

@alex We upgraded our Android tracker and we see a huge drop in android records in com_snowplowanalytics_snowplow_mobile_context_1 since then.
Looking at the code at https://github.com/snowplow/snowplow-android-tracker/commit/5bd03003f9d907a71fe668315604d7b9cdc485bb#diff-1f1672833717c43ae25052cb2231cb6dR30 it seems like you moved to 1-0-1 with the latest release.

I’m not completely sure I understand the problem you present but I suspect events with iglu:com.snowplowanalytics.snowplow/mobile_context/jsonschema/1-0-1 are being dropped sliently (i.e. the Storage Loader doesn’t fail).

Can you please advise?

Thanks


#3

Hmm, interesting @danielz.

What MAXERROR setting are you using in your config.yml for Redshift?


#4

@alex MAXERROR = 1

storage:
  download:
    folder: 
  targets:
    - name: Snowplow Redshift DB
      type: redshift
      host: {host}
      database: {database}
      port: 5439
      table: atomic.events
      username: {username}
      password: {password}
      maxerror: 1
      comprows: 200000
      ssl_mode: disable

Are we supposed to create a new table? or just run a migration?
If so, where can I find the migration files ?


#5

Right - so maxerror is 1 - this means that the load will fail if any of the events (or child tables) cannot be loaded into Redshift.

Therefore, if you are seeing a significant drop in your volumes in Redshift, the problem must lie upstream at the validation phase (inside of Hadoop Enrich). Can you look at the bad rows (specifically :enriched:bad) and check:

  1. If volumes in there seem elevated compared to before the upgrade
  2. If there are new validation failures in there that you haven’t seen before

Thanks,

Alex


#6

@alex it doesn’t seem like there’s a significant increase. Regardless I’m trying to look into those files we do have. I’m currently using less / tail / grep / cat but there must be a better way to do that?

Any other areas we might want to look into?
It is also still unclear to me whether or not what we’re seeing is expected due to the fact the schema was upgraded and whether or not we should be running some migrations.

Thanks


#7

Hi @danielz we have identified an issue in the 0.6.0 release that might be causing an invalid value to end up in the mobile_context which would then cause it to fail schema validation.

An upgrade to 0.6.1 should fix this issue and should then return your event volumes to expected levels - unless there is a different problem at play here which is not related to the Tracker.


#8

Awesome @josh . Will do that asap. Do you mind pointing me to the relevant Github issue / commit?

Thanks


#9

Hey @danielz the relevant commit: https://github.com/snowplow/snowplow-android-tracker/commit/d24eb736f8cdb1b82a693baeb9ec0a9881023c26

The topic covering the issue: Snowplow Android Tracker 0.6.1 released with mobile_context fix [IMPORTANT UPGRADE for anyone running 0.6.0]


#10

Hi @josh @alex
We upgraded the dependency but it doesn’t seem to solve the problem with the number of events in mobile_context table being extremely low.

Any ideas on how can we diagnose this problem other than downgrading back to 0.5?


#11

Hi @danielz,

At this point - the best thing to do is to setup sinking your bad events into Elasticsearch and doing some analysis in e.g. Kibana to identify which validation failures are occurring the most.


#12

Hi @alex I looked through the bad enriched events and the bad shredded events and although I found a few interesting things, I see nothing to do with this problem. We’re losing 10s of millions of mobile_context records a day and in the enriched events we have only about 177k and almost 0 in the shredded.

Where else can it get lost? Where is this data taken from in the protocol ?

It feels to me like something is not right and is not working well anymore. We might need to downgrade if we can’t resolve the issues asap as we’re losing a lot of critical data so I’d be extremely grateful if you could point us to the right direction.

Thanks in advance
Daniel


#13

Hi Daniel,

Just to make sure we understand the situation exactly; are you saying that you are still getting 10s of millions of Android events but without the mobile_context? If this is the case then it would be good to check your implementation to ensure that the mobile_context is selected to be sent along with the event.

From 0.5.4 to 0.6.x this was changed from being a part of the Subject object to being a part of the Tracker object through the mobileContext(true|false) builder option of the Tracker.

Would you mind pasting in your Tracker setup here so I can see if there is anything wrong here?


#14

Thanks @josh!
This actually was not set.
Looking at the docs, I see the default is False.
We now set it as advised and we see the context is sent - I really hope this fixes the problem, we should know within 24 hours but it seems like this is it!

I checked with our engineers and before 0.6.x we never had to explicitly declare anything about mobile-context. My small piece of feedback is that this is not a completely backwards-compatible change. Mobile (or “frontend” for that matter) developers are not fully aware of what’s happening under hood on the protocol level and how events are structured. Therefore it is not completely clear from the release notes that if one upgrades and doesn’t add this setting, it’ll change the way things are tracked. I’d strongly advise to specify that in the future.

Thanks for all great and speedy help!
Daniel


#15

Thanks @danielz - that’s fair feedback. @josh - can you add a bolded warning into section 7. API changes in the blog post: