Snowplow Kinesis to EmrEtl

Hi

I am completely new to the Snowplow world, but i have successfully set up the snowplow environment end to end and have data flowing into redshift.

My issue is I cannot find how to pull the enriched data from Kinesis into emretl to be loaded into redshift.
So i end up enriching twice once for kinesis (which we need) and then run the equivalent of the batch processing via the raw S3 logs to get it enriched and loaded in emretl.

There must be a less resource intensive way but I’ve been looking at the documentation and forum but can’t find an answer any where - please help!

@rowan, indeed, there are 2 ways of running Lambda architecture. You can find more details in this Discourse post: How to setup a Lambda architecture for Snowplow.

In short, you need to run S3 Loader on stream enriched data and EmrEtlRunner in Stream Enrich mode. This way the data will be enriched once (in Stream Enrich component) and EmrEtlRunner will be used to shred and load the data to Redshift.

@ihor Thank you for your response and details it’s really appreciated. I have poured over this area during the past couple of days and i can’t find how to invoke stream enrich mode? Hence changing to batch stream. So is it a change in config somewhere i need or is it a specific command i need to use? Many thanks in advance for your help

@rowan, by adding enriched:stream bucket to config.yml is sufficient. EmrEtlRunner is smart enough to realize you run it in Stream Emrich mode by the fact of the presence of that bucket: https://github.com/snowplow/snowplow/blob/master/3-enrich/emr-etl-runner/config/stream_config.yml.sample#L15.

Thanks very much for your help, that all worked perfectly!