Your experiences with Snowplow and suitable open source alternatives

We are thinking about building the data pipeline for our analytics tool CAP10 (for the media industry, 5-10 million events/month in the initial phase) with Snowplow, but we’re currently also looking for alternatives.
My question: What are your experiences with Snowplow and which suitable open source alternatives are currently available on the market?

Hi @Mike7L, from my experience 5-10 million per month is definitely doable for starters and I’m sure you can scale it to more later on. The only other fully open source alternative to my knowledge is only Matomo (formely Piwik) that serves the same purpose, but it isn’t as flexible as Snowplow, I’ve written about it a bit here.

@Mike7L- We’re running a number of pipelines at work, and daily volume on the largest is currently over 600M events/day. The system scales quite nicely, to the point where it’s easy to forget about the pipeline itself and direct all attention to downstream modeling.

Funny you mention piwik @evaldas- the early snowplow js tracker was inspired by it but diverged pretty quickly:

Hi Micha

We have just started to use Snowplow ourselves, so not sure that really quality to give a good advice.

Hi Evaldas,
Thanks for the detailed article, it helped me a lot. Can you please explain how you came up with $5000/month for Snowplow? Also, can you explain how Stacktome reduced it to $500/month?

Hi @mani01
Welcome to the Snowplow Community.
Cheers,
Eddie

Thanks @EddieM

1 Like

We’ve used Snowplow for a similar range (10 million events a month) and it is very robust and scalable. It’s also not very difficult to maintain, and the systems are very resilient - even when something goes wrong, a lot of times the events are not lost and are just stuck in some or the other queue to be reprocessed. I believe it costed us around $500-700 per month together with the redshift instance and all the pipeline instances.

However, it’s hard to use without a good database model - I believe there is a good inbuilt model for web, but for mobile, it’s very rudimentary and we had to do a lot of work on top of that. It makes it much harder for everyone in the team (especially those not familiar with SQL) to interact daily with data and draw insights. We got far with a combination of metabase on top of the Snowplow stack, but we couldn’t achieve true data freedom.

We are currently evaluating PostHog, and it is very promising. It comes opinionated, so if you need something very custom, Snowplow might be much better, but if you need something like Mixpanel/Amplitude - friendly UI and modeling out-of-the-box, but open source, Posthog might be worth a look.

2 Likes

We’ve now got a mobile data model available for sql-runner, and it’ll be arriving on dbt very soon.

EDIT: Read what @Colm said below, far more detail than me!

As for sending events to other destinations, we’ve recently started investing effort into supporting Google Tag Manager Server Side. This is a great complimentary piece of tech to Snowplow. We’ve already published a number of tags, and might even have a Posthog one soon too :wink:

Hey @sqlhorror thanks very much for the honest appraisal!

I believe there is a good inbuilt model for web, but for mobile, it’s very rudimentary and we had to do a lot of work on top of that.

Just a note on this - we’ve worked a lot on data models recently, and we continue to release improvements, so I wanted to take the opportunity to point some of those out:

We have a relatively new mobile data model, which does for mobile what the web model does for web: https://docs.snowplowanalytics.com/docs/modeling-your-data/the-snowplow-mobile-model/

It’s only available on sql-runner at present, but we’re currently working on a release for dbt, which should come out in the coming weeks/months.

We’ve also made massive improvements to the web model, which is now more easily customisable and built for scale & performance.

That’s available on web for dbt: https://docs.snowplowanalytics.com/docs/modeling-your-data/the-snowplow-web-data-model/dbt-web-data-model/

And sql-runner: https://docs.snowplowanalytics.com/docs/modeling-your-data/the-snowplow-web-data-model/sql-runner-web-data-model/

Customising them still does involve some work, and configuration can be hard to grapple with (this is because it does a lot of complex work) - and I won’t pretend that it’s as easy as more opinionated solutions out of the box. As you point out that ease of use is a trade-off for the flexibility/ability to customise.

I’m also not here to try to convince you away from looking into alternatives, of course you should find what works best for you. I just wanted to mention these here in case you didn’t know about them (they aim to solve the pain you describe), and if I’m honest, partly out of pride (I hope you’ll forgive me that!). :slight_smile:

Hi @sqlhorror
Welcome to the Snowplow Community. We did an Office Hours on modeling your Snowplow data in dbt.
Cheers,
Eddie

Hi @mani01, when it comes to clickstream pipeline pricing is not absolute but rather depends on data volume, velocity, variety, technical support etc. At StackTome we have a starting price from $500 which is up to a certain volume. If you want to discuss a concrete case we can follow up via email: e.miliauskas@stacktome.com

Cheers,
Evaldas

1 Like