How to shred events into Redshift from the real-time pipeline?


#1

Hello,

We are currently attempting to implement snowplow as an alternative to an older, non-scalable analytical process.

I’ve set up the scala collector, enrichment and sink (gathered from https://github.com/jramos/snowplow-kinesis-redshift-sink, which seemed to be the only way to get kinesis stream data into redshift), but run into a couple of problems:

  • Shredding appears to be impossible with this setup: derived contexts now end up as json in the events table, not as intended in i.e. com_snowplowanalytics_com_snoplow_ua_parser_context_1. How can we create a pipeline with the realtime scala & kinesis comination, which allows data to end up in redshift & allow shredding?
  • The current analytical process has quite some custom enrichment with internal data; one event for example might need to call several api points and combine their results (this data is changing only sporadically, say minor changes once a month). What is the recommended way to do this? I tried solving this with the api enrichment & javascript enrichment, though the api does not allow custom logic, and I’m not able to construct a valid javascript enrichment which allows for a delay (or to return callbacks/promises). The alternative would be loading the complete database into redshift and do merges in the analytical process, which might be a little overkill and adds complexity as it needs to manage changes as well.

Any recommendations for the above 2 problems? :slight_smile:

Thanks in advance (and for snowplow in general),

Simon


#2

Hey @esquire900 - would you mind re-posting the second question as a separate thread? Multiple distinct questions per thread isn’t great for answers or for future searching.


#3

Hi @esquire900,

To answer your first question, did you take a look at this topic?

Regards,
Ihor