However when I generate these tables in my Postgres database, these keys are not being created. Is there any part of the command I am missing? dbt run --models snowplow_web tag:snowplow_web_incremental
The doc site only mentions testing for primary keys - dbt Docs
Unfortunately dbt’s unique_key functionality only identifies unique keys for dbt snapshots (see here), and since we don’t use snapshots in our packages we should probably remove that. In my research on this, however, I did stumble across this page which outlines how to add indices (or indexes) in Postgres. I’ve added an issue to the web package’s GitHub repository, and someone from the team will get round to it soon. Thanks for bringing this to our attention, and I hope this helps put into context why no primary keys are being created at the moment.
In the meantime, however, you should be able to use an ALTER TABLE statement to add primary keys to tables that are already created. For the snowplow_web_sessions table, for example, you can run the following command against your Postgres instance:
ALTER TABLE snowplow_web_sessions
ADD CONSTRAINT primary_sessionid
PRIMARY KEY (domain_sessionid);
and this should add the primary key constraint to the table. Since we are incrementally loading to this table in subsequent dbt runs, this constraint will not be removed from the table. I hope this helps in the meantime while we update our package, thanks again for bringing this up!
Let me know if you have any other questions and have a great day,
Emiel
Thank you @Emiel! And thank you for creating the ticket.
I was able to manually add the primary keys, like you mentioned but I got some push back from my team. In an ideal situation, dbt would do this for us. I don’t really see us tanking the whole db any time soon and re-initializing the snowplow_web generated tables, so this solution is viable.
Thank you again for creating this ticket and getting it looked at.
It may make more sense in this situation for there to be a surrogate (truly unique) key on the table. Although dbt can take care of creating primary keys there’s no guarantee that domain_sessionid will be globally unique and given that Postgres does enforce primary constraints there’s a risk that this might introduce some unintentional behaviour.