Realtime data pipeline recovery


#1

Hi all,

I have created a tool to replay S3 backups of raw data form realtime data pipeline: https://github.com/grzegorzewald/SnowplowRecovery

With no modifications it would consume gzipped backups and output them to stdout - just pipe output to local copy of realtime enrichment process.

If you need/want to fix data, look at lines 21-23. In theory you may do almost anything (i have used this to fix types).


#2

Cool - thanks for sharing @grzegorzewald!