Terraform Starter

I recently started a repository with the goal of automating the creation of a fully-functional Snowplow Analytics (Scala Streaming) stack in AWS. It’s still in development:

https://github.com/fingerco/snowplow-terraform-starter

The idea is to make it easy for someone to spin up a basic system, and then tweak it to whatever use case they need, as their analytics needs grow.

I was wondering if any similar projects have already been started by Snowplow. Or if any were planned for the future? Is there anything you believe to be an “optimal setup” that would be a good starting point for most use-cases to pivot from?

7 Likes

Awesome! ill check it out - great thought to automate the deployment!

Nice work @fingerco, thanks for sharing!

We of course have plenty of internal automation around deploying and monitoring Snowplow Insights, our commercial product, but we don’t have any plans to develop and maintain open-source automation scripts for snowplow/snowplow or our other projects.

Also thought about that and might build something similar with docker + golang :smiley:

Just wanted to mention that I’ve restarted this project with more Terraform under my belt!

https://github.com/fingerco/snowplow-terraform-starter

I’ve also started a blog series that will detail progress, as it goes along!

https://pragmacoders.com/part-1-the-snowplow-collector/

2 Likes

Great work @fingerco - thanks so much for sharing!

Added the Snowplow Stream Enrich to the Terraform Starter!

https://github.com/fingerco/snowplow-terraform-starter

https://pragmacoders.com/part-2-the-snowplow-streaming-enrichment-process/

If anyone thinks that something is missing or that the configurations could be improved (for people to branch off of), please let me know in this thread or make a PR or issue!

If it’s not appropriate to post updates like this through the forum, let me know!

Update:

I think I have set up the Elasticsearch Loader via Terraform. Completing the starter:

https://github.com/fingerco/snowplow-terraform-starter

I’m going to wait a little bit, and build an app with it, to confirm that it’s in working condition and does what I want it to do, before writing the final tutorial!

Elasticsearch was chosen instead of what I’ve used in production (Redshift) because I think Elasticsearch is more affordable for smaller scales of data. That being said, I need to learn how to actually use it to know if it does what I think I’m looking for.

Let me know if you see any weirdness with this configuration or can think of a better way! Hope the Terraform configuration helps some folks to set up their initial Snowplow stack!

1 Like

Nice work! I’m not sure if the Elasticsearch loader supports ES6 yet but you can probably upgrade the reference from 1.5 to 5.5 (this will give you a few extra features as well as the latest version of Kibana).

1 Like

Nice!! Thanks for the tip! Changed it from 1.5 -> 5.5!

By watching in Kibana, I have confirmed that this pipeline is working, entirely created through these Terraform configurations. Kibana representing events immediately as they happen:

Scala Stream Collector -> Scala Stream Enrich -> Elasticsearch Loader -> Elasticsearch -> Kibana

2 Likes

Hey @fingerco,

this looks really interesting and is exactly what I am looking for. However, I cannot access the github link? Did you move the repository?

Cheers

Matthias

I believe it has moved here: https://github.com/13scoobie/snowplow-terraform-starter

2 Likes

Snowplow has now published their own set of Terraform Modules to help you get started! Initially available on AWS, with GCP coming soon.

1 Like