Recalculating big data models - how to do it with better performance?

christophe · October 7, 2017, 1:36pm

but when we add new rows […] we have to recalculate from scratch

I’m not sure I follow. If the model is incremental, it shouldn’t have to recalculate from scratch if new rows are added?

On the broader point, that’s indeed a limitation of this kind of model: it’s hard to make large changes without fully recomputing from scratch. It might be possible to develop some hybrid approach, where the most expensive steps are done incrementally while others are recomputed from scratch each time–striking a balance between efficiency and flexibility.

As a sidenote, we’re bullish on Spark as a more scalable and reliable alternative to Redshift data modeling: Replacing Amazon Redshift with Apache Spark for event data modeling [tutorial]

Topic		Replies	Views
Making SQL data models incremental to improve performance [tutorial] Redshift	11	9203	October 11, 2017
Modeling, Deduplication and Architectures Redshift	1	1316	February 25, 2018
Replacing Amazon Redshift with Apache Spark for event data modeling [tutorial] Spark	3	6762	September 5, 2017
Questions regarding data modeling & analysis For data modelers & consumers	2	1332	January 25, 2017
BigQuery Web Model v1 Released New releases	0	634	January 20, 2021

Recalculating big data models - how to do it with better performance?

Related Topics