Redshift maintenance best practices


#1

Hi all,

I’m starting to study Redshift maintenance best practices to improve our database performance. A few issues we have:

  • VACUUM on atomic.events and other large derived tables takes forever, even when the unsorted region is just ~2 weeks back. VACUUM SORT ONLY takes forever too.
  • Deep copies also take a long time, ~20h for atomic.events with ~700M rows

I believe the reason for the large unsorted region is the deduplication queries, which re-inserts data in non-sortkey order. I guess R88 will greatly improve performance, as deduplication will be handled in enrich/shred and all atomic tables can be loaded in SORTKEY order.

Any best practices you guys learned from your databases?

Cheers,
Bernardo


#2

Hi @bernardosrulzon,

Maybe your cluster is short in free space disk. We had the same issue a couple of months ago when doing a deep copy. In the graph below you can see how we tried 2 times (around Tuesday 21st & Wed 22nd) before finding the correct cluster size and succeeded with the deep copy (Thu 23rd).

Try to upscale your cluster temporarily. Ours has 11 billion lines and took ~6 hours to upscale/downscale

AWS Support says :

When doing a Deep Copy in a highly unsorted table Redshift needs to sort this table before inserting into the new one.

This sort operation will be taken place in a intermediate temporary table, which at first be placed in memory, but as the data set is too big, will eventually spillover to disk.

However, those temporary tables are not compressed, and Redshift will allocate temporary disk space for that operation which results in disk full error if there is not sufficient space for the temporary data.

Once you have your tables sorted try to vacuum them often (vacuum

to 100 percent).

Another best practice we learnt from snowplow database is that the default atomic.events definition is not 100% accurate for all cases, which is understandable as we think there is not a one-fits-all-usages encoding. After doing the above deep copy and before downscaling our cluster we ran the AWS Redshift Column Encoding utility. This reduced the usage of atomic.event from 66% of our cluster to 20% and an overall usage of the cluster from 94% (beginning of the chart below) to <50% (end of the chart, after downscaling to the original cluster size). Just beware and keep a backup of the original events before droping it.


#3

Hi Bernardo,

Are you usign…

Encode runlength and encode raw and

DISTSTYLE KEY
DISTKEY (root_id)
SORTKEY (root_tstamp) on create tables ?

There is a doc with 10 best pratices on redshift queryng.

But there is no doc for best pratices with “maintenance” on redshift. Search about “WLM dinamic memory allocation” there you can change and create rules for query allocation memory…

– query_concurrency
– memory_percent_to_use
– scan_row_count

http://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-dynamic-memory-allocation.html

We run a vaccum on all tables on Redshift… i know that it’is not only the best pratices for betther performance…but… we run in all conext tables and after on atomic.events and others low perfomance tables


#4

from time to time it’s also usefull to run a
"analyze compression" query on the tables, and recreate the tables with the recommended compression encodings.
since some time the compression encoding “zstd” is available, with really nice results, we could reduce the allocated space up to 40% for some tables