Hi @gareth - some thoughts (more structured blog post to follow):
- We want to make it easier to capture “consent” as an event. This should make it easy for anyone working with the data to be able to query data on individual user’s directly to understand what is and is not permissable to do with the data. (So the consent lives is part of the data the consent governs.)
- There may be opportunities to get users to self-identify, if that means that data controllers can more effectively guarantee their rights under GDPR.
- We want to be able to pseudonymize any field that might contain personal data. This would include IP addresses, but it could also include cookie IDs user-defined fields in specific self-describing events or custom contexts.
Pseudoanymization is really powerful because it means you can collect the data to use for analytical purposes, you just can’t then tie it back to the user to e.g. personalize their user experience with it. So ideally, we’d have an enrichment that:
- Let you specify which fields to pseudoanonymize
- Had some logic to determine which events to run on. (So you only psueodoanonymize where you don’t have consent, for example.)
Ideally this would happen upstream of writing out the collector logs, ensuring that where you don’t have consent you don’t have personal identifiable data. However, it’s pretty hard to deliver that level of functionality on the event without first processing it. So your suggestion of deleting the raw collector logs has its appeal. You’d also have to be careful with any bad rows as well.
I need to do some more thinking on how to meet GDPR obligations and keep some of the robustness that comes with being able to reprocess the event stream from scratch and recover bad events safely. Any ideas from the community appreciated!