Recommendation for reloading Redshift Loader load failures

What is the recommended way to reload events that was not successfully loaded by the Redshift Loader?
For instance we experienced that the Redshift Loader was doing a pre-loading task of running a table migration of YAUAA table, which it failed because a db view was depending on it. After removing the db view Redshift loader succeed in the table migrations step and all subsequent events was loaded. But how do we reload the events that previously failed to be loaded? Any experience or thoughts on this will be appreciated.

Our pipeline is: Collector > Kinesis > Stream Enricher > Kinesis > Stream Transformer > S3+SQS > Redshift Loader > Redshift

Hi @BrianKjaerskov can you find in S3 the batch of data that failed to load? In that directory there should be a file called shredding_complete.json. If you manually send this file to the SQS topic then RDB loader will receive the message and then re-attempt to load the batch.

You can repeat that for all batches that failed to load. It is a completely safe operation, because the loader will refuse to re-load batches that have already been loaded once.

Hi @istreeter
Thanks for you reply. No I don’t have the file shredding_complete.json. When I look at the RDB loader configuration documentation it seems like the retryQueue writes this file. I haven’t yet configured the retryQueue because this seems more like an automatic reloading process instead of a manual and I am also unsure about the behaviour of the retryQueue. The documentation for the properties retryQueue.size and retryQueue.maxAttempts mention that failures are dropped when exceeding the limit configured. Does “dropped” mean that failures is not written to the shredding_complete.json file or just not retried automatically?

Hi @BrianKjaerskov, sorry I told you slightly the wrong thing…

The instructions I gave above currently only work for the Spark transformer, not the streaming transformer. The spark transformer writes the shredding_complete message to the SQS topic and also to S3. The S3 file is there for exactly this purpose: if a batch fails to load successfully then you have a backup of the message which can be re-sent to the SQS topic.

Unfortunately the streaming transformer does not write the shredding_complete message to S3. This is a feature we will be adding in the next few months, so you can look out for it in a future release. Sadly this won’t help you right now though!

For now, if you want to re-trigger loading of the failed batch then you can still send the message to SQS, but you will need to manually create the message from scratch. This is inconvenient but not impossible. If you look at one of the other messages in the SQS queue then you will see what it needs to look like. The message is base64-encoded JSON and the schema of the message is documented here.

You will need to make sure the base field is set to the S3 directory containing the failed batch. And make sure the types field correctly lists all of the entities (unstructured events or contexts) in the batch. The timestamps and event count are not important, so don’t worry about getting those fields right.

Hi @istreeter
Thanks for your detailed answer. I will try and recreate the SQS message. I think I can get the S3 path of the failed events from the keys file created by the folder monitoring.
I will look forward to this feature in a future release.