Recalculating big data models - how to do it with better performance?

Hi @danielparedes,

but when we add new rows […] we have to recalculate from scratch

I’m not sure I follow. If the model is incremental, it shouldn’t have to recalculate from scratch if new rows are added?

On the broader point, that’s indeed a limitation of this kind of model: it’s hard to make large changes without fully recomputing from scratch. It might be possible to develop some hybrid approach, where the most expensive steps are done incrementally while others are recomputed from scratch each time–striking a balance between efficiency and flexibility.

As a sidenote, we’re bullish on Spark as a more scalable and reliable alternative to Redshift data modeling: Replacing Amazon Redshift with Apache Spark for event data modeling [tutorial]

2 Likes