I have been working on setting up snowplow analytics for my company. I decided to go the route of using kinesis streams, to allow for flexibility. The requirements for my companies solution needs real-time as well as batch processing of events. This is why I decided to go the lambda architecture way. The only issue is that the lambada architecture showed here is out of date. I was wondering if the setup I created below is ideal to new features or should I change the approach I am using. Thanks.
A lot of the s3 business can be done through kinesis firehose:
Depending on what you are doing within your shredding this could possibly near on replace your batch processing and the instances needed to do the processing and shorten the time
Nice use of Elasticsearch, I bet you have got some awesome realtime dashboards up there!
I did look into that. Using firehose is a good suggestion. In what way would you suggest to implement it. I am not seeing many people who have set it up for snowplow, so I was not sure how it should be used. Ya the Elastisearch is working and looking great. Kibana gives a nice real-time dashboard. Thanks for the feedback.
Looks like a great infrastructure to me. You’re persisting events to S3/long term storage regardless of bad/good streams which is often something people forget.