Is this Lambda Architechture possible


Hi Snowplowers,
I presume when we set up batch pipeline successfully we move into RT :slight_smile: I came across this great post from @ihor. And further came across other posts where it was mentioned that that Drip feed into Redshift wasn’t yet possible.

Continuing the discussion from How to setup a Lambda architecture for Snowplow:

Am I correct to say that if I have the BATCH pipeline set up, then I can set up the RT stream by setting up
KinesisEnrich --> KinesisGood/Bad Stream --> Kinesis ES Sink --> Kibana. The only difference is that is two parallel stream and in future we might not require Batch at all once the RS drip feed is ready ?

I hope I made sense here.


On-premise Snowplow Realtime Pipeline with Spark Streaming Enrich

If you setup the real time pipeline you’ll get both the Elasticsearch sink (realtime) and the Redshift sink (batch) without duplicating architecture.

I’m not sure about the feasibility of drip-feeding Redshift, from the implementations I’ve seen it never quite works that well. Redshift is an excellent data warehouse but a poor real time analytics database.


If you already have the batch pipeline setup, there is no equivalent lambda architecture you can introduce to then bring in the real-time load into Elasticsearch. This is because the Clojure Collector cannot feed our real-time pipeline - it’s a fundamentally batch-oriented collector (it rotates to S3 hourly).

Your options are:

  1. Setup a complete end-to-end real-time pipeline in parallel (i.e. starting from Scala Stream Collector onwards)
  2. Rebuild your setup to use the standard Snowplow lambda architecture that you reference


@alex Thanks for your reply.

Maybe I misquoted, my set up is:

Scala Stream Collector --> Kinesis S3–> ETLEMR -->Storage Loader.

Based on your suggestion do I can simply add ?
Existing Scala Collector --> Stream Enrich–>Elatisearch.

Appreciate your time.



Hey @sachinsingh10 - aha, sorry for the confusion! In that case yes, with your current setup:

Scala Stream Collector --> Kinesis Raw Stream --> Kinesis S3--> ETLEMR -->Storage Loader

Yes just add:

Scala Stream Collector --> Kinesis Raw Stream --> Kinesis S3--> ETLEMR -->Storage Loader
                                              --> Stream Enrich-->Elatisearch


Thanks @alex what do you recommend I put --> Stream Enrich–>Elastisearch on? As in Loadbalanced Ec2, any minimum config recommendation.