Snowplow_web data models not creating primary keys in Postgres

Hello all!

I am using the default data models from dbt snowplow_web. GitHub - snowplow/dbt-snowplow-web: A fully incremental model, that transforms raw web event data generated by the Snowplow JavaScript tracker into a series of derived tables of varying levels of aggregation.

I see that the data models identify primary keys - dbt-snowplow-web/snowplow_web_sessions.sql at b41150684b6a3d63789178b4299a7a5c3a37cd43 · snowplow/dbt-snowplow-web · GitHub

However when I generate these tables in my Postgres database, these keys are not being created. Is there any part of the command I am missing? dbt run --models snowplow_web tag:snowplow_web_incremental

The doc site only mentions testing for primary keys - dbt Docs

Any advice is appreciated. Thank you!

1 Like

Hey @wambam, welcome to the Snowplow community!

Unfortunately dbt’s unique_key functionality only identifies unique keys for dbt snapshots (see here), and since we don’t use snapshots in our packages we should probably remove that. In my research on this, however, I did stumble across this page which outlines how to add indices (or indexes) in Postgres. I’ve added an issue to the web package’s GitHub repository, and someone from the team will get round to it soon. Thanks for bringing this to our attention, and I hope this helps put into context why no primary keys are being created at the moment.

In the meantime, however, you should be able to use an ALTER TABLE statement to add primary keys to tables that are already created. For the snowplow_web_sessions table, for example, you can run the following command against your Postgres instance:

ALTER TABLE snowplow_web_sessions
     ADD CONSTRAINT primary_sessionid
          PRIMARY KEY (domain_sessionid);

and this should add the primary key constraint to the table. Since we are incrementally loading to this table in subsequent dbt runs, this constraint will not be removed from the table. I hope this helps in the meantime while we update our package, thanks again for bringing this up!

Let me know if you have any other questions and have a great day,
Emiel

4 Likes

Thank you @Emiel! And thank you for creating the ticket.

I was able to manually add the primary keys, like you mentioned but I got some push back from my team. In an ideal situation, dbt would do this for us. I don’t really see us tanking the whole db any time soon and re-initializing the snowplow_web generated tables, so this solution is viable.

Thank you again for creating this ticket and getting it looked at.

:beers:

It may make more sense in this situation for there to be a surrogate (truly unique) key on the table. Although dbt can take care of creating primary keys there’s no guarantee that domain_sessionid will be globally unique and given that Postgres does enforce primary constraints there’s a risk that this might introduce some unintentional behaviour.

1 Like