Snowplow > Kafka > Druid

Is it possible to setup a pipeline as suggested in the title?

If so, what parts do I need to make this work?

  1. Is this assumption correct?:

Scala Stream Collector
Scala Stream Collector installed on two CentOS instances with a load balancer in front of it. Collecting the events from the trackers.

Setup the Kafka Sink
As found on: https://github.com/snowplow/snowplow/wiki/Configure-the-Scala-Stream-Collector

The collector.streams.sink.enabled setting determines which of the supported sinks to write raw events to:
"kafka" for writing Thrift-serialized records and error rows to a Kafka topic
You should fill the rest of the collector.streams.sink section according to your selection as a sink.

I then read the Kafka Topic using the thrift extension:
https://druid.apache.org/docs/latest/development/extensions-contrib/thrift.html

  1. Where would I find the settings for the kafka sink?

Finally, I add the JavaScript tracker to my website and it gets things going?
3. Can I rename snowplow functions so adblockers don’t pickup sp.js or fired events?

Is this about right to get things going?

In addition, I think I need this: https://github.com/snowplow/snowplow/tree/master/3-enrich/stream-enrich
To read events from the collectors and push them to a Kafka topic, is that correct? Or does the Stream Enrich read from the raw Kafka Topic instead?