Using Snowplow with Postgresql


I was investigating Snowplow, but I was stuck understanding the workflow between stream enrich and the relational database.

  1. Install NSQ message queue.
  2. Install Snowplow Stream Collector to collect your Tracker events which send to NSQ.
  3. Install Snowplow Stream Enrich and Snowplow Iglu Server to enrich events to NSQ.
  4. How does the NSQ events that are enriched become stored in the relational database?
  5. Install a Yugabyte cluster as Postgresql compatible database.

For NSQ - this is typically used in Snowplow Mini only and not in production pipelines. For production pipelines the messaging component is generally handled with PubSub, Kinesis or Kafka.

Events that come from one of the above services are then sunk to a database (Snowflake, Redshift, BigQuery or Postgres are supported) using their respective loaders (Redshift / Postgres - rdb loader, Snowflake and BigQuery have their own loaders).

I’m not too sure that a Postgres compatible Yugabyte cluster would work with the current RDB loader that utilises a Postgres JDBC connection under the hood. It might but it would be safer initially to use something that is fully compatible with the Postgres wire protocol.