RDB loader container fails when there's no new shredded data

Deyan_Deyanov · June 29, 2021, 10:48am

Hi all,

We’ve setup the streaming Snowplow components - enricher, shredder and rdb loader and we’re loading events in Redshift. However, we observe something weird happening when there are no events going through the pipeline. For example during periods with very low traffic or when we’re redeploying our website or putting it briefly in maintenance mode, the Shredder component is sending new messages to the SQS queue and the RDB loader is consuming those messages and is then trying to load S3 folders which don’t exist in Redshift. And as a result, the RDB loader container fails immediately when that happens.

Let me illustrate it with some logs. We currently had a prolonged downtime for our service which sends events to Snowplow and as a result no events where passing through the pipeline.

This was the log from the shredder component:

We’ve currently configured the shredder component with a 2 minute “windowing”. But then this is what happend in the same minute when RDB loader received the message:

Is this an expected behaviour?
We’re using version 1.1.0 for the shredder and loader containers.

BenB · July 6, 2021, 9:02am

Hi @Deyan_Deyanov ,

This is weird, loader should not run COPY statements if there is no shredded data, the check happens here.

Could you please share the content of run=2021-06-29-06-18-00/shredding_complete.json ?

Deyan_Deyanov · July 13, 2021, 9:17am

Hi Ben,

The weird thing is that there is no such folder in S3. It looks like when there are no events this folder is eirther not created or it’s empty and S3 is automatically hiding/removing it as a result. For example, during this specific period we had a continuous flow of events between 2021-06-29-06-00-00 and 2021-06-29-06-16-00, then a small batch of events at 2021-06-29-06-26-00 and no new events after that for the remainder of that hour. And this is what we have as folders in S3:

BenB · July 22, 2021, 11:23am

Hi @Deyan_Deyanov ,

Sorry I missed the fact that you are using the streaming version of shredder. This version is still in alpha and is not production-ready yet. It’s possible that the state used by the app during a window does not get reinitialized as it should when there is no data. We will work on the next phase of development this quarter and will check that.

Meanwhile we encourage to setup the batch shredder.

Topic		Replies	Views
RDB Loader "could not load a folder" AWS real-time pipeline	1	923	May 28, 2021
RDB Loader fails after upgrading the events-enrichment to version 2.0.0 Troubleshooting	3	901	September 13, 2021
RDB shredder doesn't create S3 folder referenced in SQS message For engineers	2	1118	July 7, 2022
[IMPORTANT ALERT] R90-R91 bug may result in shredded types not loading into Redshift after recovery Troubleshooting	2	2439	September 7, 2017
Snowplow RDB Loader 1.0.0 released New releases	0	1260	April 15, 2021

RDB loader container fails when there's no new shredded data

Related Topics