Should I use views in Redshift?


#1

We received the following question from one of our customers:

I was wondering if you could offer some advice on the general performance of views in Redshift? We are finding them slow, and would like some pointers on what to do and what not to do. Additionally, we would like to know what your thoughts are on creating materialized views?

The answer might benefit other users as well, so we’re cross-posting it here.

We recommend against using views in Redshift. The 3 main reasons are:

  • views are not materialized, so there is no inherent performance benefit
  • views are hardcoded to the table, not the table name, and difficult to update (if we need to recreate a table in atomic, all views that use that table will break)
  • the Redshift query planner doesn’t optimize through views - so e.g. constraining a SELECT FROM query on a view with a WHERE clause is slower than if the view itself was defined with that same WHERE clause

Rather than create a view, I’d run the exact same SQL and create a table instead (it would in effect be a materialized view). You can use SQL Runner to schedule SQL queries to run as part of the Snowplow pipeline. Each time new data is loaded into Redshift, the SQL queries are run, and the set of derived tables gets updated with the latest data. You can then run queries against these derived tables from a BI tool, rather than from the atomic tables themselves.


Are there still data cube sql files available