Best approach to incremental data modeling? (etl_tstamp vs. staging)

tclass · October 6, 2016, 9:50am

I read a little bit about data modeling and there are two different approaches I can find. One is in the snowplow documents and the other in the example modeling playbook for the SQL runner.

There is one approach that you take the etl_tstamp and make sure to only process every event for one etl_tstamp once.

The other approach as you can see in the SQL runner playbook example, uses different schemas like landing, atomic, derived to do data modeling. So we first add all the events into a landing schema and only after we data modeled all the events we append all the data into the atomic schema.

the 2nd approach seems really heavy weight and we would have to create every table twice and do all the copying stuff. I’m not 100% sure of how I would do the 1st approach with writing the processed etl_tstamps in a seperate table and counting all the events for a particiular etl_tstamp and I’m not sure what could be the advantages or disadvantages for both approaches.

christophe · October 6, 2016, 10:31am

Hi @tclass,

The examples on GitHub are outdated and will be removed soon (to be replaced with improved documentation & tutorials). You are correct about the second approach - we no longer recommend creating a landing schema for those exact reasons.

Our current approach is more like what I outlined here: Making SQL data models incremental to improve performance [tutorial]

Hope this helps!

Christophe

Topic		Replies	Views
Making SQL data models incremental to improve performance [tutorial] Redshift	11	9195	October 11, 2017
Using a manifest to keep track of what data has been modeled (incremental modeling) For data modelers & consumers	0	1220	April 18, 2016
Snowplow data modeling issues when updating schema SQL Runner	1	853	March 22, 2022
Session stitching, session_end_tstamp For data modelers & consumers	1	1188	July 21, 2016
Event_Staged in Data Modeling for new Business requirement SQL Runner	2	971	December 29, 2021

Best approach to incremental data modeling? (etl_tstamp vs. staging)

Related Topics