I’m considering to move to Snowplow realtime pipeline and slightly confused with technology choice.
In particular, I’m wondering why AWS Kinesis has been chosen instead of (from
my point of view) almost default for this task technology like Spark Streaming or more modern Flink etc. For me, Kinesis/Kafka looks more like Pub/Sub wire rather then data-processing tool, unlike Spark. But probably I’m missing something as I have not much experience with them.
As I can see, nothing prevents us from including Scala Common Enrich into Spark Streaming and at the same time application currently called Stream Enrich would be much smaller.
So, I guess, my primary question is following: what design choices lead you to use Kinesis/Kafka instead of Spark Streaming or similar.
Thanks in advance!