RDBLoader failing to process SQS messages

anton · May 31, 2021, 3:42pm

How long was it taking to load a folder? It seems that visibility timeout is way too low, by the time Loader tries to acknowledge (delete) the message - it is already “lost” to SQS. In 1.0.0 we set it to 5 minutes and acknowledge after the folder has been loaded. As a result if Loader sees the message again - it just considered a duplicate and ignored, but that’s because we have a manifest in 1.0.0, R35 doesn’t have this feature.

I’m still trying to wrap my head around Shredding EMR spark config (IOException: All datanodes ... are bad) - #5 by dadasami to help you to migrate to 1.0.1.

Is there anything wrong with our SQS setup?

I think you can set visibility timeout only in code? If I’m wrong - I definitely recommend you to set to something higher.

We are running the RDBLoader on a single Fargate instance. Does it help if it runs on multiple instances in parallel?

No, certainly not. Loader is designed to be running as a singleton to avoid race conditions and overwhelming Redshift.

Topic		Replies	Views
RDB loader container fails when there's no new shredded data Storage targets	3	997	July 22, 2021
Data loss on RDBLoader at relaunch on Fargate? Redshift	1	995	June 20, 2021
RDB Loader "could not load a folder" AWS real-time pipeline	1	921	May 28, 2021
Snowplow RDB Loader 1.0.0 released New releases	0	1260	April 15, 2021
RDB shredder doesn't create S3 folder referenced in SQS message For engineers	2	1117	July 7, 2022

RDBLoader failing to process SQS messages

Related Topics