This package only supports Redshift. BigQuery and Snowflake support will follow in later releases.
- Processes events incrementally.
- Is designed in a modular manner, allowing you to easily integrate your own custom SQL into the incremental framework provided by the package.
- A custom incremental materialisation, to reduce table scans during the upsert stage.
- A full suite of tests to ensure data integrity.
- A comprehensive data dictionary.
This package consists of a series of modules, each producing a table which serves as the input to the next module.
The ‘standard’ modules are:
- Base: Performs the incremental logic, outputting the table
snowplow_web_base_events_this_runwhich contains a de-duped data set of all events required for the current run of the model.
- Page Views: Aggregates event level data to a page view level,
- Sessions: Aggregates page view level data to a session level,
- Users: Aggregates session level data to a users level,
The majority of the incremental logic sits within the base module and performs the ‘heavy-lifting’ for you. The logic is as follows:
- Identify new or late arriving events since the last run of the package.
- Identify the
domain_sessionidassociated with these new/late events.
- Reprocess all events associated with the
domain_sessionid. This ensures when aggregating to a session level we have all the events associated with the session.
This de-duped dataset is then written to an
events_this_run table, containing all the required events for the given run of the web model.
events_this_run table removes complexity when adding your own customisations. You can now write drop and recompute style SQL using the
events_this_run as a source, without having to worry about which events to select.
Furthermore this reduces cost and improves performance. Since the
events_this_run table is shared between the standard modules and your customizations, we negate the need to query the raw events table multiple times. For more information on writing custom SQL, please refer to the docs. An example dbt project demonstrating customisations can also be found within the repo
Checkout the web data model section of Snowplow Docs for more information on the models structure.
Checkout the snowplow-web package docs for a quickstart guide as well as an explanation of operating and configuring the package.