I want to decrypt user_ids in a user_id column of redshift which contains encrypted user_ids enriched by PII pseudonymization enrichment.
Is there any way to achieve this?
I want to decrypt user_ids in a user_id column of redshift which contains encrypted user_ids enriched by PII pseudonymization enrichment.
Is there any way to achieve this?
@Dev - it’s not encryption, but rather a 1-way hash function. It’s virtually impossible to derive the original input from the hashed value. Also, considering you’re currently hashing the User ID, you may want to think about the legality of revealing these user IDs for your situation - since it sounds like you hashed User IDs for a reason.
However, that aside…
If you generate (or have) a list of the User IDs (along with any salt you may have used), you can easily reverse the hashes.
I’ll share a practical example in a moment when I switch computers (at Mint Metrics, we reverse a couple of internal IPs’ hashes for use in report filtering).
I cobbled together a quick Python script to reverse hashed data for IPs. However it could also be used/modified to work with your User IDs.
It accepts input via arguments or stdin - so it’s not really limited to just IPs.
import hashlib, os, sys
salt = os.getenv('SNOWPLOW_PII_SALT')
def hash_ip(input_ip):
hashed_ip = hashlib.md5()
hashed_ip.update(f'{input_ip}{salt}'.encode('utf-8'))
return hashed_ip.hexdigest()
def hash_range(start, end):
import socket, struct
start = struct.unpack('>I', socket.inet_aton(start))[0]
end = struct.unpack('>I', socket.inet_aton(end))[0]
return [hash_ip(socket.inet_ntoa(struct.pack('>I', i))) for i in range(start, end)]
if __name__ == '__main__':
if len(sys.argv) == 2 or len(sys.argv) == 3:
if len(sys.argv) == 2:
print(hash_ip(sys.argv[1]))
if len(sys.argv) == 3:
hashed_list = hash_range(sys.argv[1], sys.argv[2])
[print(hashed_ip) for hashed_ip in hashed_list]
else:
[print(hash_ip(line.strip('\n'))) for line in sys.stdin]
ip_tools.py
export SNOWPLOW_PII_SALT=LotsAndlotsOfDeliciousSalt
python ip_tools.py $USER_ID
or if you have a list of user IDs, you can feed it via stdin and write it to a file cat list_of_user_ids | python ip_tools.py > output_file
Thanks…
@ robkingston thanks… was of great help