Snowflake DB Loader Error & Recovery question

Currently running snowflake data loader 0.4.0. I’m upgrading to 0.5.0 next. I had some bad rows and I couldn’t locate the bad rows in each file. I just pulled the entire folder “run=2019-09-27-00-15-22” from the S3 bucket. I then re-ran the loader process. All the remaining data loaded just fine. How can I reload the for the directory that I pulled?

  • can i just return the folder with the data and remove the item from the DynamoDB table?
  • I could just put the actual cleaned up text files in the same folder as the next run.
  • I’m hoping that I can just remove the manifest item from dynamodb.
  • other ideas.

As always, I greatly appreciate any insights.

Example of the error message

Error during events/enriched/archive/run=2019-09-27-00-15-22/ load. String https://stuff.servername.com/abc/efg/success.php?ppn_bundle=_KsectwQohVIeO195yPTYjlTb1UPjNqtAfdtxWkxuvrDIDTwiMIArKnAdoDnuMogDJmai8KoAAsndLlHRsrCX3MXTZ4cPQnkvhn1MBODSm2cTLNbCfQqzBRfSfDucvU4eEdLd9e...' is too long and would be truncated,  No new columns were added, safe to rerun.

@sonnypolaris, you do not have to pull the run folder out. You simply can set ToSkip to true in the manifest DynamoDB table. Depending on the issue, if, say, you managed to clean up the offending records, switch ToSkip to false to resume the load.

If you want to reprocess the run folder all over again then yes, you would need to remove/delete the record related to it from the manifest. Don’t forget to treat the run folders in the buckets accordingly. For reprocessing you would want to delete the run folder in the bucket for transformed Snowflake data.

This worked! I had moved some folders / files around so I had to recreate the proper state but once in “un-did my so called fix” all went well. Thanks for you help.

FYI, upgrading to 0.5 with the bad rows really helped.