I have two separate queries that generate tables that have the following structure:
user_id | 2017_spend | 2017_decile
user_id | 2016_spend | 2016_decile
where the decile columns is a 10% percentile grouping, generated using Redshift’s RANK() and NTILE window functions.
Now, I want to join these two into a single table that has every unique User ID in the first column, and their spend and decile information in the following 4 columns. This seems a straightforward task, but I have not been able to get this work as desired. Every JOIN that I do (LEFT, LEFT OUTER, FULL OUTER, INNER) results in a loss of data. Because if someone was a customer in 2016 but not in 2017 (or vice versa), then there will be no match and the JOIN will exclude them. And as I understand, a UNION will simply stack the two tables on top of each other, resulting in duplicate user_id’s.
Am I missing something basic? (It feels like I am) .
Any help let me know!