How to organise data warehouse to allow building dynamic funnels?

#1

I’m looking for the proper way of storing data in the warehouse, that allows users creation of dynamic fynnels via Looker (similar idea as we can see in Mixpanel). So I have a lot of event in atomic.event table, some custom events in nearby tables and I can not find a answer to question, how to put them all together, to enable building dynamic (user defined) funnels. I mean that users decide how many events in funnel they put and what order it will be. Does anybody have such experience with Snowplow data?

#2

This is an interesting question I have been thinking about for years. In a funnel it’s important that the user numbers in one step have done the events before it earlier in the funnel in that order.

This is something very tricky to do with relational SQL as it would involved left join after left join for each subsegment of a stage in the funnel to grab only the users who did the next event. I have found this to be heavy to query and hard to template.

Amplitude have found a performant way to do this with AWS redshift under the hood but this is proprietary and extremely expensive at scale.

At the last London meetup Dylan from Snowplow presented a very interesting talk about porting snowplow data into a graph database with a specialised snowplow schema to hold the information. A graph database would be a very elegant solution for this as the structure works on the joins instead of the columnar storage making the query style above much more performant as well as creating sankey diagrams etc.

Hopefully there is some news of this coming soon from snowplow?

1 Like
#3

@brucey31 We are using redshift, so it is still intersting to know the way that was found for Redshift. Do Amplitude post some details on it? Blog or talks?

#4

If we all know how Amplitude did it they wouldn’t be a series D hundred million+ dollar company.

Here is a useful link to describe what I was saying about how to do it with the joins. It will require lots of the right distkeys & sortkeys to get any type of performance out of it using redshift.

#5

I was supposing that it is quite well-known problem, taking a look that Mixpanel and GA allows building dynamic funnels and this is not killer feature of Amplitude, that is why I was asking. Thanks, for the link, will check it!

#6

Hey @sphinks,

Here’s a (quite old) cookbook for funnels using snowplow. It should give you a good idea of how to get where you need to go, if not a comprehensive guide.

It’s complex to get your head around how to shape the data, but it’s not the most difficult thing I’ve had to do, it’s possible. Note that depending on what tool you’re using, the underlying table for a funnel viz would have a different shape.

In terms of performance - if you’re querying atomic data over a long time then yeah it likely takes a while. But in general it’s recommended to have some model that aggregates that data to a meaningful level which comes before doing this kind of thing. For example it’s straightforward enough to build a funnel off the web model’s page views table compared to straight from the atomic data.

Hope that helps, best of luck with it!

#7

I almost forgot - you should also take a look at Indicative! That tool is built for this exact purpose, and plays nicely with Snowplow via the Indicative Relay.

Definitely worth taking a look at that tool. :slight_smile: