Quick question for everyone!
We’ve got a custom event that we’ve been inadvertently collecting millions of data points on, filling our servers, S3, and redshift.
The obvious first response is “what’s the harm in keeping the data? S3 is cheap!”. To that I say: “We know, but we want it gone anyway for one reason or another.” We did consider that though, so the idea isn’t lost on me. We’ve made a calculated business decision to remove the data anyway.
So my questions are:
- Can we simply truncate the shredded rows in the database tables they are separated into?
- Can we simply delete the lines for the related events in the S3 EMR archived logs? Will they still be able to be enriched at a later date? (Or should we clear out the bulky event meta and keep the event rows?)
- Is there any cleanup that needs to be done on the collectors to remove this data? (I assume not as these are dumped into S3 on the hour)
- Are there any other places we should be removing this extra data to ‘slim’ down our storage of these events?
Thanks in advance!