Upgrade from 2-yr-old version, where is unstruct_event col?


#1

We recently switched from Snowplow v2.3.0, which I think is about the same as R66, to R95. We had to do this rather suddenly on 12/15, when AWS removed AMI-3.6.0 from EMR, and our ETL jobs quit working.

Rather than try to upgrade through 29 or so releases, we’re working with a fresh installation of R95 now.

Everything seems to be working OK, except: Now there is no unstruct_event column in atomic.events.

We had one type of unstruct event that was not being shredded into its own table. Instead, in downstream processing, we grabbed the json out of the unstruct_event column and parsed it ourselves. (We did this because it is a fairly new data enhancement we get from a third party API, and their json record layout is subject to change.)

So my question is: Is this raw json still being written to Redshift, somewhere I haven’t found it yet? Or can it be?


#2

Hi @wleftwich,

Those JSON columns were removed just over two years ago (which is perhaps why you hadn’t upgraded already). The relevant release was Release 73 Cuban Macaw (2015-12-04):

Has the third party API continued to be unstable for the past 2 years? Perhaps it’s stabilised now and you could formalize the schema.

We have some plans to better handle this kind of situation (unreliable third-party making schema’ing difficult) - expect an RFC from us on schema inference sometime in the new year.


#3

Thanks Alex. The third party API is only a few months old. We have not upgraded in two years, I guess, because the old version was meeting requirements.

So, just to confirm, my best bet for getting the data from the third party API is to create an iglu schema, right? And if that doesn’t work because of flaky JSON, I guess I will grovel through the raw log archives.


#4

Hey @wleftwich,

Yes - if it were me, I would:

  • Use Redshift UNLOAD to dump all of the unstruct_event for the unreliable API events to S3
  • Run Schema Guru on the output
  • Manually inspect and “loosen” the schema

Schema Guru is here;

Hope this helps,

Alex