Is this Lambda Architechture possible


#1

Hi Snowplowers,
I presume when we set up batch pipeline successfully we move into RT :slight_smile: I came across this great post from @ihor. And further came across other posts where it was mentioned that that Drip feed into Redshift wasn’t yet possible.

Continuing the discussion from How to setup a Lambda architecture for Snowplow:

Am I correct to say that if I have the BATCH pipeline set up, then I can set up the RT stream by setting up
KinesisEnrich --> KinesisGood/Bad Stream --> Kinesis ES Sink --> Kibana. The only difference is that is two parallel stream and in future we might not require Batch at all once the RS drip feed is ready ?

I hope I made sense here.

Regards
SS


On-premise Snowplow Realtime Pipeline with Spark Streaming Enrich
#2

If you setup the real time pipeline you’ll get both the Elasticsearch sink (realtime) and the Redshift sink (batch) without duplicating architecture.

I’m not sure about the feasibility of drip-feeding Redshift, from the implementations I’ve seen it never quite works that well. Redshift is an excellent data warehouse but a poor real time analytics database.


#3

If you already have the batch pipeline setup, there is no equivalent lambda architecture you can introduce to then bring in the real-time load into Elasticsearch. This is because the Clojure Collector cannot feed our real-time pipeline - it’s a fundamentally batch-oriented collector (it rotates to S3 hourly).

Your options are:

  1. Setup a complete end-to-end real-time pipeline in parallel (i.e. starting from Scala Stream Collector onwards)
  2. Rebuild your setup to use the standard Snowplow lambda architecture that you reference

#4

@alex Thanks for your reply.

Maybe I misquoted, my set up is:

Scala Stream Collector --> Kinesis S3–> ETLEMR -->Storage Loader.

Based on your suggestion do I can simply add ?
Existing Scala Collector --> Stream Enrich–>Elatisearch.

Appreciate your time.

Regards
SS


#5

Hey @sachinsingh10 - aha, sorry for the confusion! In that case yes, with your current setup:

Scala Stream Collector --> Kinesis Raw Stream --> Kinesis S3--> ETLEMR -->Storage Loader

Yes just add:

Scala Stream Collector --> Kinesis Raw Stream --> Kinesis S3--> ETLEMR -->Storage Loader
                                              --> Stream Enrich-->Elatisearch

#6

Thanks @alex what do you recommend I put --> Stream Enrich–>Elastisearch on? As in Loadbalanced Ec2, any minimum config recommendation.

Regards
SS