Snowplow Realtime pipeline with Docker

Hi guys,

I’ve been looking for an easy way to test snowplow realtime pipeline on a local machine, mainly just to get a better understanding of the components, but also in part because I wanted to write a custom enrichment with JavaScript script enrichment - https://github.com/snowplow/snowplow/wiki/JavaScript-script-enrichment.

Seeing as people were struggling in debugging it’s behaviour here: Best way to test/debug javascript enrichment scripts?, I was not satisfied with either bringing up Scala REPL or running the script in node.js and so I set up the realtime pipeline locally on docker, extending the example given in the snowplow-docker repo.

This makes it much easier to test custom enrichments, as you are only required to restart scala-stream-enrich container and enriched data is instantly visible in Kibana.

I used NSQ and set the buffer to only store 1 record for testing purposes and instant feedback.

I hope this could help others in a similar position and stand in as a fast and easily extensible alternative to snowplow-mini.

You can find the project here: https://github.com/kazysgurskas/snowplow-realtime-docker

3 Likes

Great work, @kazgurs1, thanks a ton!

Yes indeed - many thanks for sharing @kazgurs1! A great effort :fireworks:

Thank you for your perfect job!

But I have a problem :frowning: When I’m starting docker containers and try to send events to collector from js then I get 500 HTTP status code, but health query returns OK btw. Do you know how to solve the problem?

Screens are attached below:


Hi casperWWW,

Anything useful in collector logs? docker logs stream-collector

1 Like

Thank you for the tip, kazgurs1! I checked out the log and found that this container couldn’t connect to nsqd container. I just linked them in one network and finally that works now! Thank you again! :wink: