Questions regarding data modeling & analysis



I’m trying to get a basic batch pipeline with a Clojure collector going, and I reached the point where data is being imported to the table in Redshift.

Now I’m trying to wrap my head around the data modeling / analysis part. In connection to that, I have some questions about stuff which is unclear to me:

  1. The link 5-data-modeling/sql-runner/redshift in the guide here getting-started-with-data-modeling is dead. I guess it should point here instead 5-data-modeling/web-model/redshift right?

  2. I’m not sure I understand what the purpose difference is between 5-data-modeling/web-model/redshift and 5-data-modeling/web-model/sql-runner ?

  3. The so called sql-runner when is that one meant to be used? Should it be installed separately from EmrEtlRunner?

  4. In the analysis section it recommends setting up some prebuilt views: Setting-up-the-prebuilt-views-in-Redshift-and-PostgreSQL How do they differ from 5-data-modeling/web-model/redshift ? Do you need one or both?





sql-runner is a separate application that executes your data modeling SQL queries in a specified order (so you don’t have to run each query manually every day/hour). It has it’s own config file which specifies the queries to run, the order, and the database on which to run the queries.

The data-modeling/…/redshift folders provide example SQL queries for creating basic data models (higher level aggregate tables) in Redshift. I imagine a lot of people customize them (as we do) so that they can build in their own business logic. The idea is to build these higher level aggregated tables so you don’t have to directly query with long complicated queries every time you need to answer a basic business question.