Redshift + Snowplow Mini version 4


#1

A bit lost here. #5 and #6 I have everything set up and working - but now would like things sent to my redshift database. Tables are created.

On this link:

https://github.com/snowplow/snowplow-mini/wiki/Setup-guide-part-2-(for-later-versions)

Where is says:

  1. Add the IP of Snowplow-mini to your Redshift whitelist
  2. Enter the required details (AWS access key, secret key, redshift host, port, database name, username, password and schema name) via the setup bash script

Can someone provide a bit more insight on how to get my events from my snowplow mini sent to my redshift database? My tables are all created. Just would like to get my pageviews, etc. updated in my redshift database.

Any input?


#2

Hi @prodigy - I’ve replied to your question here:

Unfortunately you will have to setup a Snowplow batch pipeline to load into Redshift; sorry for the confusion, we’ve deleted that wiki page now.


#3

has this changed? is there anyway to get my snowplow-mini data into a database? thanks for letting me know.


#4

Hi @RedMapleMedia,

No, nothing changed on that front unfortunately.


#5

For my own personal edification/improved understanding of the Snowplow Mini setup, what’s the reason that it doesn’t support loading into Redshift? It sounds like it’s technically incompatible with Redshift loading. A bit of a noob here but I’m not sure I fully understand how that can be the case. Many thanks!


#6

Hey @pearsonhenri,

The genuine answer to that question is that Snowplow Mini is something which organically grew from an idea by chance, rather than something we sat down, designed and developed as a product.

We at Snowplow had a Hackathon a few years ago and one of the ideas was a small-scale, demo version of Snowplow - so we can show prospective customers a material example of what Snowplow is. It worked really well, and we realised that an out of the box, small and cheap version of Snowplow which runs in a sandbox is a really useful testing tool.

So we ran with it and it’s incredibly useful to that end. The latest version of Mini allows you to test everything about your tracking setup, including enrichments.

The idea of loading from mini to a storage target is something that’s come up a few times, and been knocked around. I don’t have a role in setting the roadmap or anything so I can’t speak to why we haven’t given that a good go, but I can guess that there’s a cost-benefit judgment there. We’ve got a LOT of development on the core Snowplow product - we’ve recently released the GCP pipeline, we’ve been improving loading to different storage targets, and we’ve got a big initiative to refactor bad rows. So in short I think it’s probably just because we build and maintain a lot of software and unfortunately not every good idea is something we can focus on.

Another personal opinion of mine is that once you get to a point where you’re loading to a storage target, you’re essentially just building an unscalable Snowplow pipeline (ie it’ll break at volume). So if that’s where you’re at why not just build the full pipeline? In other words I think the use case to having SP Mini load is pretty close to the use case for the full pipeline - I’m not sure it’s terribly common to need the former without being likely to need the latter too.

Good question, a bit of Snowplow folklore there for you!