Deploying Snowplow on Kubernetes

Hi,

Is anyone using kubernetes to deploy Snowplow?

How has your experiences been? Are there any existing docker or kubernetes ymls around?

Looking something that is based on the incremental kafka pipeline. If it doesn’t require s3 tests can be run on a local desktop.

The goal is hook up spark and a sql database.

1 Like

Hi iFire,
Our team has the need but did not find any useful information so far.
Did you find anything recently?
We can catch up and talk about it

We abandoned snowplow since there was no response here.

What tools did you move to? I recently found out that even gitLab is using Snowplow

Hi @kuangmichael07 ,

You can find how to deploy Snowplow on AWS and GCP on our docs website :

Please do not hesitate if you have any questions!

Note that if you try deploying snowplow on Kubernetes you’ll might run into an issue with the IAM roles: snowplow-collector-scala does not use IAM Role for Service accounts in a container · Issue #186 · snowplow/stream-collector · GitHub

Edit: but that issue is easily fixed, just make sure to specify default

I’m running the full snowplow stack on Kubernetes. I can answer specific questions.

1 Like

We are working internally on migrating everything to run properly on top of EKS and will also be looking to release Helm Charts along with it - so can also answer questions on how to get started with Snowplow Components on EKS as well!

2 Likes

Hi @brad.inscoe, I am currently trying to switch snowplow from GCE to GKE, not found much resources and tutorials online. Would you mind sharing any steps or scripts? Thanks!

We are working on getting snowplow deployed in k8s in our own data centers.
It is no easy journey I think. Have had to write our own helm charts and done some extension to the docker images to get the right log format. Still have not gotten metrics in from all components.

Hi Josh,

I don’t suppose you have any news or updates when this might be available. We would want to start migrating to EKS this year hopefully and this would be of huge benefit to us.

Many Thanks,
Rob

This is one of the few resources writing about a Snowplow Kubernetes deployment. Maybe it’s helpful for you:

https://www.datascienceengineer.com/blog/post-ha-snowplow-on-k8

Hey Rob! So all the services should already work on EKS actually now - we have not setup Helm Charts for everything but we did create some helpful generic charts that you can leverage to deploy the various services.

You can use the OIDC authentication scheme to bind IAM roles to the pods to grant them access to services like Kinesis, S3 etc that you need. We haven’t done a blog post or anything like that on this yet but as far as I am aware it should all work now.

Are there particular resources missing that would be helpful for you to get started moving across?

Hi, I have created helm chart github.com/lukaspastva/helm-snowplow
however it is still not working for me because there is no support for input from kafka for snowplow-postgres-loader
please note, im talking purely k8s even big data part is in postgre and instead of pubsub i have strimzi

anyone to help with developing the feature for GitHub - snowplow-incubator/snowplow-postgres-loader: Real-time Postgres Loader ?

feel free to contact me