Enrich old data


#1

Hi,

We are using the EmrEtlRunner flow for batch processing of our events. We just started enriching our events using the API and SQL enricher. Is there any way of enriching the old events that we already have in Redshift?


#2

@ramandamodar, the only way is to reprocess the raw archived files. You could try running a separate “recovery” pipeline loading the data into a different Redshift schema. Once completed, delete the old records and copy over the new one. Do bear in mind that the self-describing events need to be deleted first before attempting to delete the records from the parent events table. Depending on the volume to recover you might do it the other way round - copy over the latest (“properly enriched”) records to the new schema and then rename it to the original one after dropping the old.