Snowflake dB loader versus snowpipe for storage step

Ryan_Newsome · May 8, 2020, 2:32pm

Hi there,
I’m nearly up to the storage step of the snowplow setup, and am wondering what the benefits of using the snowflake dB loader instead of using Snowflake’s snowpipe feature (to pickup and ingest the good enriched files generated by enrich process)?

Cheers
Rysn

mike · May 9, 2020, 1:45am

The Snowflake DB loader keeps a manifest of what data (and what columns) have been loaded by Snowplow into Snowflake. This means that if you start sending in a new event / context to your pipeline then the loader can transform your atomic.events table before loading the data in so you don’t end up with ‘column not found’ style errors.

As far as I know this pattern isn’t easy to establish with Snowpipe which is instead designed more for loads into tables that do not dynamically change or if they do the expectation becomes that a manual modification of ALTER PIPE and ALTER TABLE may occur.

Ryan_Newsome · May 9, 2020, 5:52am

Thanks @mike I really appreciate it. Snowflake DB loader it is!

I dug a little deeper and eventually found the documentation covering the functionality you have described: https://github.com/snowplow-incubator/snowplow-snowflake-loader/wiki

Thanks again,
Ryan

medicinal-matt · January 5, 2022, 5:15pm

Wouldn’t using the Snowpipe instead of the Snowflake Loader also miss event deduplication? Or does event deduplication already happen in the Stream Enrichment?

BenB · January 6, 2022, 1:42pm

Hi @medicinal-matt ,

There is no deduplication in the enrichment stream, but you can activate it in Snowflake loader. Instructions can be found here.

Topic		Replies	Views
Multiple SQL enrichments depending on event schema Enrichment	2	405	February 29, 2024
Duplicate events, using event_id as partition_key Troubleshooting	1	2532	October 20, 2017
Migrating from Redshift to Snowflake Storage targets	3	2457	June 11, 2018
Customizing our Snowplow event representation in Redshift Redshift	9	2316	September 26, 2016
Snowplow Events from Google Bucket to BigQuery Storage targets	1	989	July 29, 2020

Snowflake dB loader versus snowpipe for storage step

Related Topics