Redshift + Snowplow Mini version 4

prodigy · December 22, 2017, 11:46pm

A bit lost here. #5 and #6 I have everything set up and working - but now would like things sent to my redshift database. Tables are created.

On this link:

Where is says:

Add the IP of Snowplow-mini to your Redshift whitelist
Enter the required details (AWS access key, secret key, redshift host, port, database name, username, password and schema name) via the setup bash script

Can someone provide a bit more insight on how to get my events from my snowplow mini sent to my redshift database? My tables are all created. Just would like to get my pageviews, etc. updated in my redshift database.

Any input?

alex · December 23, 2017, 5:28pm

Hi @prodigy - I’ve replied to your question here:

Unfortunately you will have to setup a Snowplow batch pipeline to load into Redshift; sorry for the confusion, we’ve deleted that wiki page now.

RedMapleMedia · December 3, 2018, 2:59am

has this changed? is there anyway to get my snowplow-mini data into a database? thanks for letting me know.

anton · December 3, 2018, 7:00am

Hi @RedMapleMedia,

No, nothing changed on that front unfortunately.

pearsonhenri · January 23, 2019, 11:40am

For my own personal edification/improved understanding of the Snowplow Mini setup, what’s the reason that it doesn’t support loading into Redshift? It sounds like it’s technically incompatible with Redshift loading. A bit of a noob here but I’m not sure I fully understand how that can be the case. Many thanks!

Colm · January 23, 2019, 12:24pm

Hey @pearsonhenri,

The genuine answer to that question is that Snowplow Mini is something which organically grew from an idea by chance, rather than something we sat down, designed and developed as a product.

We at Snowplow had a Hackathon a few years ago and one of the ideas was a small-scale, demo version of Snowplow - so we can show prospective customers a material example of what Snowplow is. It worked really well, and we realised that an out of the box, small and cheap version of Snowplow which runs in a sandbox is a really useful testing tool.

So we ran with it and it’s incredibly useful to that end. The latest version of Mini allows you to test everything about your tracking setup, including enrichments.

The idea of loading from mini to a storage target is something that’s come up a few times, and been knocked around. I don’t have a role in setting the roadmap or anything so I can’t speak to why we haven’t given that a good go, but I can guess that there’s a cost-benefit judgment there. We’ve got a LOT of development on the core Snowplow product - we’ve recently released the GCP pipeline, we’ve been improving loading to different storage targets, and we’ve got a big initiative to refactor bad rows. So in short I think it’s probably just because we build and maintain a lot of software and unfortunately not every good idea is something we can focus on.

Another personal opinion of mine is that once you get to a point where you’re loading to a storage target, you’re essentially just building an unscalable Snowplow pipeline (ie it’ll break at volume). So if that’s where you’re at why not just build the full pipeline? In other words I think the use case to having SP Mini load is pretty close to the use case for the full pipeline - I’m not sure it’s terribly common to need the former without being likely to need the latter too.

Good question, a bit of Snowplow folklore there for you!

Topic		Replies	Views
Customizing our Snowplow event representation in Redshift Redshift	9	2320	September 26, 2016
Redshift tables Redshift	4	1550	September 15, 2017
Trouble setting up views on AWS Redshift Redshift	5	1964	January 26, 2019
Passing values from atomic.events to a custom table Redshift	5	4006	May 11, 2017
Loading enriched events into IBM dashDB RFCs	2	2639	November 21, 2016

Redshift + Snowplow Mini version 4

Related Topics