Snowplow-web 0.4.0 dbt package released

We are very happy to announce the release of the snowplow-web v.0.4.0 dbt package.

This release brings support for the Postgres adapter (big thanks to @zloff!), a user mapping module and improved filtering of problematic long sessions.

Features

  • Postgres adapter support (#45)
  • A user mapping module, mapping between domain_userid and the latest user_id . This mapping is applied to the sessions table to produce a stitched_user_id (#42)

Improvements

  • Improved filtering of problematic long sessions, reducing table scans and the chance of duplicate sessions (#41)
  • All manifest tables are now created using dbt models rather than DDL. This allows these tables to be dropped using dbt’s full-refresh flag as well as appear in the lineage graph (#39)
  • Improved Redshift event dedupe logic (#33)

Fixes

  • Fix cluster_by_fields macros to allow overriding (#35)

Under the hood

  • Refactor BigQuery page view enrichments (#40)
  • Add a GH Action to test all PRs (#43)

Breaking changes

  • The user mapping module changes the schema of the sessions table by adding the stitched_user_id column.
  • The snowplow_delete_from_manifest macro has been replaced with the snowplow_web_delete_from_manifest . This should only affect users of the package running this macro as an operation. Refer to the README for more information.
  • The snowplow__manifest_custom_schema var has been deprecated. The schema for all manifest tables is now set directly in the dbt_project.yml file. If you had previously set a custom manifest schema you will need to update your dbt_project.yml file to reflect this. Please refer to the Output Schema section of the docs for more info.
  • The mechanism to teardown all the manifest tables and start afresh has changed. This can now be achieved by using the native dbt full-refresh flag when running the manifest tables, rather than using the now deprecated teardown_all var. Note due to their critical nature the manifest tables are protected from accidental full-refreshes in production. Please refer to the Manifest Tables section for more details.
  • BigQuery Only: This release imports snowplow-utils v0.4.0 which introduced a breaking change to the combine_column_versions macro. If using this macro for modelling please update accordingly.

Upgrading

To upgrade bump the version of the package in your packages.yml file.

This release contains breaking changes, namely to the sessions table schema. As a result you be required to do a full refresh of the package:

dbt run --models snowplow_web --full-refresh --vars 'snowplow__allow_refresh: true'
2 Likes