Difference between events in the shredded/archive/<date>/atomic-events/ and enriched/archive/<date> folders

pocin · August 6, 2018, 10:25am

I’ve been wondering what’s the difference betwen the csvs in
shredded/archive/run=<date>/atomic-events/
vs
enriched/archive/run=<date>

is the shredded/../atomic-events a subset of atomic events found in enrchired/archive or there are more events in enrhiced/archive than in shredded/?

or is the enriched/... folder merely an intermediate step and the shredded/... folder can be considered “final” destination and contains all events (from the canonical event model + [un]structured events), from where the RDB loader eventually takes the events?

ihor · August 6, 2018, 10:38pm

@pocin, the (shredded) atomic_events are canonical events in TSV format that end up in events table. The enriched events apart from canonical data contain custom (self-describing) events and contexts as well as derived contexts. During the shredding, the self-describing (unstruct) data gets shredded out of the enriched event and is kept in “shredded” bucket in JSON format ready to be loaded into Redshift. You can see a visualization of this process here (diagram at the bottom): https://github.com/snowplow/snowplow/wiki/StorageLoader.

Shredding is only applicable if you need to load the data into Redshift. The canonical (TSV) data is loaded to events table with COPY command while self-describing (JSON) data is loaded with COPY FROM JSON command.

Topic		Replies	Views
[redshift] unstructured event not save in correct schema Redshift	5	4168	February 27, 2017
Migrating from Redshift to Snowflake Storage targets	3	2436	June 11, 2018
How Snowplow data is structured in Snowflake Snowflake	5	4460	May 8, 2020
Unstructured Events and the events table Data store sources	5	1915	March 7, 2019
ELI5: Where can I find the schema for the canonical event model? Redshift	5	1779	March 1, 2019

Difference between events in the shredded/archive/<date>/atomic-events/ and enriched/archive/<date> folders

Related Topics