PII enrichment MD5 / SHA-1 salt values required for Redshift compatibility?


#1

Hi guys,

Just validating my config against the new schema for pii_enrichment_config/2-0-0 and noticed that salt is required under pseudonymize.

  1. What salt was used before this option was added in R106?
  2. What value should we use here to ensure it matches the output of Redshift’s MD5 and FUNC_SHA1 functions?

Cheers,
Rob


#2

Alright, did some reading up on hashing and it looks like I may have answered my own question (please correct me if I’ve misunderstood).

Tl;dr:

  1. Looks like no salt prior to R106
  2. Just pass an empty string to the enrichment to designate no salt being passed into the function

Using salt with cryptographic hash functions just adds to the functions’ input, so we just have to add the salt onto the end of the plain text. Below, we can apply our salt “test” to our plaintext input of “hello”:

select 
  md5('hellotest') as md5_salt_test, 
  md5('hello') as md5_no_salt;
md5_salt_test md5_no_salt
200f9319fc232a9254b275f0dcaad797 5d41402abc4b2a76b9719d911017c592

There doesn’t seem to be any standard method to applying salt so you have to make sure it matches where you will be using it elsewhere (e.g. hashing old data in SQL Runner). Judging from the enrichment source code, we simply have to append the salt to the string in Redshift to match the salt in Enrichment:


#3

Hey @robkingston,

that is correct.

If you want to read the background on adding salt, you can read the relevant issue here: https://github.com/snowplow/snowplow/issues/3648

As well as the wiki on this enrichment: https://github.com/snowplow/snowplow/wiki/PII-pseudonymization-enrichment


#4

Thanks @knservis - did a bit of reading up on that thread this afternoon.