Cloud Storage Loader Output Scheme

Hello there!

I am trying to ingest our Snowplow events into Snowflake eventually, with a full pipeline implemented in GCP (Scala Stream Collector → Stream Enrich PubSub → Cloud Storage Loader). Since the Snowflake Loader doesn’t currently support GCP, we are going to sink our stream into a bucket via the Cloud Storage Loader, and then read it to Snowflake via Snowpipe.

I am curious if anyone can explain what the data output format will be when it gets sunk into the bucket? Is it a super wide file? Shredded into atomic, context and custom event tables? CSV? TSV? Any clarification would be much appreciated!

Thanks so much!

Yep - the data format that pops out of enriched is in a wide TSV format - this format is consumable by any of the analytics SDKs as well as the shredder process.

The Snowflake model has its own dedicated shredder that runs on Spark but I imagine it’s probably portable to GCP with some changes. There’s also stream shredder (not dependent on Spark) but that’s not in a production ready state yet.