Snowflake dB loader versus snowpipe for storage step

Hi there,
I’m nearly up to the storage step of the snowplow setup, and am wondering what the benefits of using the snowflake dB loader instead of using Snowflake’s snowpipe feature (to pickup and ingest the good enriched files generated by enrich process)?

Cheers
Rysn

The Snowflake DB loader keeps a manifest of what data (and what columns) have been loaded by Snowplow into Snowflake. This means that if you start sending in a new event / context to your pipeline then the loader can transform your atomic.events table before loading the data in so you don’t end up with ‘column not found’ style errors.

As far as I know this pattern isn’t easy to establish with Snowpipe which is instead designed more for loads into tables that do not dynamically change or if they do the expectation becomes that a manual modification of ALTER PIPE and ALTER TABLE may occur.

Thanks @mike I really appreciate it. Snowflake DB loader it is!

I dug a little deeper and eventually found the documentation covering the functionality you have described: https://github.com/snowplow-incubator/snowplow-snowflake-loader/wiki

Thanks again,
Ryan

Wouldn’t using the Snowpipe instead of the Snowflake Loader also miss event deduplication? Or does event deduplication already happen in the Stream Enrichment?

Hi @medicinal-matt ,

There is no deduplication in the enrichment stream, but you can activate it in Snowflake loader. Instructions can be found here.

2 Likes