Recently we have added a new field to one of our custom context tables. The developer that was creating the payload for this context thought at the time that the value of this field would fit into a numeric(3,2) type in Redshift. As it turned out, some of the values had 10 to 12 decimal places so we are getting errors during enrichment such as:
remainder of division is not zero (1.100000023841858 / 0.01)
The developer has now added to his JS code a check to make sure the values passed in the request back to our collector will not cause this error.
The problem I am trying to solve now is to take all the raw archived events, change the value for this custom context field to one with 2 decimal places, save them again in the correct format, and reprocess them through our snowplow pipeline. I think the format of the raw archived files are LZO compressed thrift files and I am having a hard time figuring out a way to process those. I have tried following the suggestions on this psot https://www.snowflake-analytics.com/blog/2016/12/13/decoding-snowplow-real-time-bad-rows-thrift with no success. The Python script I have written is having a hard time parsing the thrift file. I of course have used lzop to decompress the .lzo first, so not sure what the issue is.
I am writing to find out if others have a suggestion on how to accomplish this task. I have searched discourse and found topics where others have solution for processing the same thrift files, but in my case I need to correct a value first before I process them again. Could I take the payload from my enriched/bad/ S3 bucket and send them to the collector again in a new request? Could I somehow take payloads from enriched/bad/, change the value for the new field to one with 2 decimal places and then add them back to the /enriched/good/ S3 bucket and have snowplow pick them up from there?
I appreciate any help on this issue.
Thank you