We’ve setup the streaming Snowplow components - enricher, shredder and rdb loader and we’re loading events in Redshift. However, we observe something weird happening when there are no events going through the pipeline. For example during periods with very low traffic or when we’re redeploying our website or putting it briefly in maintenance mode, the Shredder component is sending new messages to the SQS queue and the RDB loader is consuming those messages and is then trying to load S3 folders which don’t exist in Redshift. And as a result, the RDB loader container fails immediately when that happens.
Let me illustrate it with some logs. We currently had a prolonged downtime for our service which sends events to Snowplow and as a result no events where passing through the pipeline.
This was the log from the shredder component:
We’ve currently configured the shredder component with a 2 minute “windowing”. But then this is what happend in the same minute when RDB loader received the message:
Is this an expected behaviour?
We’re using version 1.1.0 for the shredder and loader containers.