Repopulate a single table?


#1

On a newly populated instance of R97, I have accidentally deleted all the rows from a custom “unstructured events” table. All the other Redshift tables are intact.

Is it possible to repopulate the empty table from the shredded archive bucket?


#2

@wleftwich, it should be possible.

  1. Pause the pipeline if on schedule to avoid conflict.
  2. Create a separate bucket and copy all the shredded good files for this specific event from the archive shredded bucket (make sure to retain the same prefixes).
  3. Create a separate config.yml (just in case) with shredded:good pointed to the bucket in step 2.
  4. Run the EmrEtlRunner with --resume-from rdb_load --skip archive_enriched,archive_shredded

NOTE: The options depend on the version of the EmrEtlRunner. Not sure if you actually can use both --resume-from and --skip options at the same time. The idea is to load data only and avoid archiving the files as they are already in archive bucket. If it doesn’t work then archive into a separate “bin” bucket to be deleted after completion.


#3

Great, thanks for the quick advice.


#4

I ended up doing this a different way.

Inspecting the shredded:archive paths specific to this unstruct type, it occurred to me that I could use the Redshift “COPY JSON” sql command and load each path directly into a staging table. That’s what I did, and it seemed to work fine.


#5

Thanks for sharing @wleftwich!


#6

@wleftwich, that’s yet another way to do it indeed. The rdb_load step of the ETL job uses COPY FROM JSON to load shredded types. It is fine if you have just a few runs in the archived folder. However, it would be very tedious if the data for a long period is to be recovered.

Glad it worked out for you.