How to use Snowplow as a Publish/Subscriber event bus?


#1

Hi, I’m new to snowplow but I was wondering if any of you tried to use its infrastructure as a Publisher/Subscriber architecture where you can not only track the event but also listen to them and at least be able to call a webhook or some other endpoint.

Any ideas?


#2

Hey @guevaraf! This is very doable with the Snowplow real-time pipeline.

The Scala Stream Collector will emit a stream of raw Snowplow events as a Kinesis stream; our Stream Enrich component then listens to that stream of raw Snowplow events, enriches them and writes out the enriched Snowplow events as a second Kinesis stream.

Kinesis is actually a bit more powerful than traditional pub/sub message queues. When creating a Kinesis consumer, I can specify whether I want to start reading a stream from:

  1. TRIM_HORIZON (which is the earliest events in the stream which haven’t yet been expired aka “trimmed”)
  2. LATEST which means the first events which arrive after I start reading
  3. AT_SEQUENCE_NUMBER {x} which means from the event in the stream with the given offset ID
  4. AFTER_SEQUENCE_NUMBER {x} which is the event immediately after 3.
  5. AT_TIMESTAMP to read records from an arbitrary point in time

So a Kinesis stream (like a Kafka topic) is a very special form of database - it exists independently of any consumers. By contrast, most pub/sub systems expect you to create a subscription, and a subscriber is only guaranteed to receive any message published after this point. So that’s more like Kinesis’ LATEST support.

In any case - yes with the Snowplow real-time pipeline you can attach your own pub/sub-style workers to our Kinesis streams. Because the raw event stream is so, well, raw, it’s strongly recommended that you attach to the enriched event Kinesis stream. We even have a couple of SDKs to make reading events from that stream easy in your own apps: