Retrieve User_ids encrypted by enrichment


#1

I want to decrypt user_ids in a user_id column of redshift which contains encrypted user_ids enriched by PII pseudonymization enrichment.

Is there any way to achieve this?


#2

@Dev - it’s not encryption, but rather a 1-way hash function. It’s virtually impossible to derive the original input from the hashed value. Also, considering you’re currently hashing the User ID, you may want to think about the legality of revealing these user IDs for your situation - since it sounds like you hashed User IDs for a reason.

However, that aside…

If you generate (or have) a list of the User IDs (along with any salt you may have used), you can easily reverse the hashes.

I’ll share a practical example in a moment when I switch computers (at Mint Metrics, we reverse a couple of internal IPs’ hashes for use in report filtering).


#3

I cobbled together a quick Python script to reverse hashed data for IPs. However it could also be used/modified to work with your User IDs.

It accepts input via arguments or stdin - so it’s not really limited to just IPs.

import hashlib, os, sys

salt = os.getenv('SNOWPLOW_PII_SALT')


def hash_ip(input_ip):
    hashed_ip = hashlib.md5()
    hashed_ip.update(f'{input_ip}{salt}'.encode('utf-8'))
    return hashed_ip.hexdigest()


def hash_range(start, end):
    import socket, struct
    start = struct.unpack('>I', socket.inet_aton(start))[0]
    end = struct.unpack('>I', socket.inet_aton(end))[0]
    return [hash_ip(socket.inet_ntoa(struct.pack('>I', i))) for i in range(start, end)]


if __name__ == '__main__':
    if len(sys.argv) == 2 or len(sys.argv) == 3:
        if len(sys.argv) == 2:
            print(hash_ip(sys.argv[1]))
        if len(sys.argv) == 3:
            hashed_list = hash_range(sys.argv[1], sys.argv[2])
            [print(hashed_ip) for hashed_ip in hashed_list]
    else:
        [print(hash_ip(line.strip('\n'))) for line in sys.stdin]


  1. Add it to a file e.g. ip_tools.py
  2. Add your salt to your environment by setting your ~/.bash_profile up with export SNOWPLOW_PII_SALT=LotsAndlotsOfDeliciousSalt
  3. (Optional) Create a virtual environment for Python 3.6 if you’re not running it by default
  4. Run it with the syntax: python ip_tools.py $USER_ID or if you have a list of user IDs, you can feed it via stdin and write it to a file cat list_of_user_ids | python ip_tools.py > output_file

#4

Thanks…


#5

@ robkingston thanks… was of great help