I’m interested in running as much of Snowplow on-premise as possible. Currently, I’m stuck on the step of configuring picking up of Enriched events and putting them into PostgreSQL storage.
Some of the documentation I’ve seen seems to imply that S3 is required for persisting the events to Storage.
Is there a way to swap-out using Amazon S3 buckets for an on-premise option, for communicating Enriched Events to persist them to long-term storage in PostgreSQL?
If s3 is required, has anyone tried using use a local Minio instance instead?
FYI, I’m using Kafka as the back-end, and have simple steps configured for
- Snowplow collector (scala), writing to Kafka
- Snowplow stream enrich, reading from Kafka
Currently, the company I work for uses Mesosphere DC/OS. There is considerable investment in running their systems in DC/OS (locally hosted) and keeping data within their own data centers.
Snowplow use cases are as follows
- Web-click tracking and analysis - most traditional Snowplow use case.
1a. Track/analyze users’ clicks, page navigation, changing of selected options, timings on pages and page items, etc. Perform analytics on this info.
- Serve as a “Messaging Framework” - Hence the interest in Kafka.
2a. Audit Only Events - capture for audit only events from various applications and systems, for auditing and further analytics.
2b. Serve as Central Messaging Hub / Broker, so for Actionable Events (events from one or more systems/apps that require further applications/systems to perform some processing ). These would also be audited. Here, minimal latency between putting an event in a collector and having the enriched event available in a Kafka topic will be critical.
I appreciate any comments and guidance.