Real-time pipeline


#1

Hello

I want to set a pipeline like:

Tracker-> collecter-> enrich-> kinesis Stream-> Cassandra->ElasticSearch->kibana

Can i do this?
If yes then how?

Thanks


On-premise Snowplow Realtime Pipeline with Spark Streaming Enrich
#2

Hi @JeetuChoudhary,

That pipeline is not currently possible:

  • We don’t have an app for sinking Snowplow enriched events from Kinesis into Apache Cassandra
  • A Cassandra->Elasticsearch hop isn’t something we have considered (and it doesn’t sound like a Snowplow-specific component)

We do have a Kinesis Elasticsearch Sink which will load your enriched events direct into Elasticsearch from Kinesis.


#3

I haven’t done this through Cassandra, however I been playing with something similar to get enriched snowplow events from kinesis to elastic search in nearly real time. I have achieved this through amazon lambda functions to trim the horizon of new events from the enriched kinesis stream and used python code to transform and send the events to your elastic cache instance.

I have never used Cassandra before and do not know what your use case for having this is an intermediary is. Assuming it is for medium term data storage, analysis and reporting, I used kinesis firehose to deliver the validated and enriched data to redshift to solve this use case and it has worked out quite nicely for us.

I hope this helps.