I’m experiencing some strange spikes in disk space used on redshift every 6 hours and I can’t tell for sure what it is. I know that no data will be lost if disk space is reaching 100% in Redshift but I want to understand what is causing this and if I should be alarmed.
The EmrEtlRunner, StorageLoader and SqlRunner are running every 6 hours with no problems but around the time Emr is finished an alarm is triggered in Redshift (PercentageDiskSpaceUsed is over 80)
Redshift alarm history from CloudWatch in 30/07/2017
EmrEtlRunner history in 30/07/2017:
For example if I inspect Redshift for the alarm triggered in 30/07 at 05:40 I can see is right after the creation of web.page_views_tmp and loading page_views_tmp. Is the loading into memory of all data (green bar) causing this spike?
For the alarm tirggered in 30/07 at 10:12 is also the creation of web.page_views_tmp.
My guess is that this is happening in of of the SQLRunner process and is super normal.
- Is the data insertion in redshift truly related to the alarm I see?
- The SqlRunner is invoking following playbooks: deduplicate, refined-model-unload, web-model, refined-model-add.
- Should I worry and add another redshift instance?
Thanks and looking forward for some explanations.