Compilation Error in model snowplow_web_sessions_this_run

Hello guys,

I’m trying to follow the Advanced Analytics for Web accelerator, but got stuck:

$ dbt run --selector snowplow_web
11:36:43  Running with dbt=1.7.8
11:36:47  Registered adapter: bigquery=1.7.6
11:36:49  Found 20 models, 107 tests, 3 seeds, 2 operations, 8 sources, 0 exposures, 0 metrics, 747 macros, 0 groups, 0 semantic models
11:36:49
11:36:57
11:36:57  Running 1 on-run-start hook
11:36:57  1 of 1 START hook: snowplow_web.on-run-start.0 ................................. [RUN]
11:36:57  1 of 1 OK hook: snowplow_web.on-run-start.0 .................................... [OK in 0.00s]
11:36:57
11:36:57  Concurrency: 4 threads (target='dev')
11:36:57
11:36:57  1 of 18 START sql incremental model atomic_snowplow_manifest.snowplow_web_base_quarantined_sessions  [RUN]
11:36:57  2 of 18 START sql incremental model atomic_snowplow_manifest.snowplow_web_incremental_manifest  [RUN]
11:37:00  2 of 18 OK created sql incremental model atomic_snowplow_manifest.snowplow_web_incremental_manifest  [CREATE TABLE (0.0 rows, 0 processed) in 3.17s]
11:37:00  1 of 18 OK created sql incremental model atomic_snowplow_manifest.snowplow_web_base_quarantined_sessions  [CREATE TABLE (0.0 rows, 0 processed) in 3.17s]
11:37:00  3 of 18 START sql table model atomic_scratch.snowplow_web_base_new_event_limits  [RUN]
11:37:03  12:37:03 + Snowplow: No data in manifest. Processing data from start_date
11:37:07  12:37:07 + Snowplow: Processing data between '2022-08-19 00:00:00' and '2022-09-18 00:00:00' (snowplow_web)
11:37:07  3 of 18 OK created sql table model atomic_scratch.snowplow_web_base_new_event_limits  [CREATE TABLE (1.0 rows, 0 processed) in 6.99s]
11:37:07  4 of 18 START sql incremental model atomic_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest  [RUN]
11:37:14  4 of 18 OK created sql incremental model atomic_snowplow_manifest.snowplow_web_base_sessions_lifecycle_manifest  [CREATE TABLE (1.3k rows, 1.6 MiB processed) in 7.15s]
11:37:14  5 of 18 START sql table model atomic_scratch.snowplow_web_base_sessions_this_run  [RUN]
11:37:22  5 of 18 OK created sql table model atomic_scratch.snowplow_web_base_sessions_this_run  [CREATE TABLE (1.3k rows, 115.0 KiB processed) in 7.68s]
11:37:22  6 of 18 START sql table model atomic_scratch.snowplow_web_base_events_this_run . [RUN]
11:37:28  6 of 18 OK created sql table model atomic_scratch.snowplow_web_base_events_this_run  [CREATE TABLE (15.9k rows, 23.0 MiB processed) in 6.22s]
11:37:28  7 of 18 START sql table model atomic_scratch.snowplow_web_pv_engaged_time ...... [RUN]
11:37:28  8 of 18 START sql table model atomic_scratch.snowplow_web_pv_scroll_depth ...... [RUN]
11:37:28  9 of 18 START sql table model atomic_scratch.snowplow_web_sessions_this_run .... [RUN]
11:37:28  10 of 18 START sql incremental model atomic_derived.snowplow_web_user_mapping .. [RUN]
11:37:29  9 of 18 ERROR creating sql table model atomic_scratch.snowplow_web_sessions_this_run  [ERROR in 0.84s]
11:37:32  7 of 18 OK created sql table model atomic_scratch.snowplow_web_pv_engaged_time . [CREATE TABLE (1.1k rows, 1.6 MiB processed) in 3.35s]
11:37:32  8 of 18 OK created sql table model atomic_scratch.snowplow_web_pv_scroll_depth . [CREATE TABLE (2.0k rows, 1.9 MiB processed) in 3.45s]
11:37:32  11 of 18 START sql table model atomic_scratch.snowplow_web_page_views_this_run . [RUN]
11:37:32  10 of 18 OK created sql incremental model atomic_derived.snowplow_web_user_mapping  [CREATE TABLE (0.0 rows, 712.9 KiB processed) in 3.94s]
11:37:32  12 of 18 SKIP relation atomic_derived.snowplow_web_sessions .................... [SKIP]
11:37:32  11 of 18 ERROR creating sql table model atomic_scratch.snowplow_web_page_views_this_run  [ERROR in 0.50s]
11:37:32  13 of 18 SKIP relation atomic_scratch.snowplow_web_users_sessions_this_run ..... [SKIP]
11:37:32  14 of 18 SKIP relation atomic_derived.snowplow_web_page_views .................. [SKIP]
11:37:32  15 of 18 SKIP relation atomic_scratch.snowplow_web_users_aggs .................. [SKIP]
11:37:32  16 of 18 SKIP relation atomic_scratch.snowplow_web_users_lasts ................. [SKIP]
11:37:32  17 of 18 SKIP relation atomic_scratch.snowplow_web_users_this_run .............. [SKIP]
11:37:32  18 of 18 SKIP relation atomic_derived.snowplow_web_users ....................... [SKIP]
11:37:32
11:37:32  Running 1 on-run-end hook
11:37:32  1 of 1 START hook: snowplow_web.on-run-end.0 ................................... [RUN]
11:37:36  1 of 1 OK hook: snowplow_web.on-run-end.0 ...................................... [MERGE (3.0 rows, 124.0 KiB processed) in 3.29s]
11:37:36
11:37:36
11:37:36  Finished running 7 incremental models, 11 table models, 2 hooks in 0 hours 0 minutes and 47.03 seconds (47.03s).
11:37:36
11:37:36  Completed with 2 errors and 0 warnings:
11:37:36
11:37:36    Compilation Error in model snowplow_web_sessions_this_run (models/sessions/scratch/bigquery/snowplow_web_sessions_this_run.sql)
  Snowplow: No columns found with prefix contexts_com_iab_snowplow_spiders_and_robots_1

  > in macro get_columns_in_relation_by_column_prefix (macros/utils/get_columns_in_relation_by_column_prefix.sql)
  > called by macro combine_column_versions (macros/utils/bigquery/combine_column_versions/combine_column_versions.sql)
  > called by macro get_optional_fields (macros/utils/bigquery/get_optional_fields.sql)
  > called by model snowplow_web_sessions_this_run (models/sessions/scratch/bigquery/snowplow_web_sessions_this_run.sql)
11:37:36
11:37:36    Compilation Error in model snowplow_web_page_views_this_run (models/page_views/scratch/bigquery/snowplow_web_page_views_this_run.sql)
  Snowplow: No columns found with prefix contexts_com_iab_snowplow_spiders_and_robots_1

  > in macro get_columns_in_relation_by_column_prefix (macros/utils/get_columns_in_relation_by_column_prefix.sql)
  > called by macro combine_column_versions (macros/utils/bigquery/combine_column_versions/combine_column_versions.sql)
  > called by macro get_optional_fields (macros/utils/bigquery/get_optional_fields.sql)
  > called by model snowplow_web_page_views_this_run (models/page_views/scratch/bigquery/snowplow_web_page_views_this_run.sql)
11:37:36
11:37:36  Done. PASS=9 WARN=0 ERROR=2 SKIP=7 TOTAL=18

Could you please take a look and advise?

Just in case, I’m using version 1.0.0 of snowplow/snowplow-web, as per the accelerator’s docs. While running dbt deps I noticed a mention of a newer snowplow/snowplow_utils version:

$ dbt deps
11:29:17  Running with dbt=1.7.8
11:29:19  Updating lock file in file path: /Users/xak/dev/my/yuji/snowplow-advanced-analytics-for-web/dbt_sp/package-lock.yml
11:29:19  Installing snowplow/snowplow_web
11:29:19  Installed from version 1.0.0
11:29:19  Up to date!
11:29:19  Installing snowplow/snowplow_utils
11:29:20  Installed from version 0.15.2
11:29:20  Updated version available: 0.16.2
11:29:20  Installing dbt-labs/dbt_utils
11:29:21  Installed from version 1.1.1
11:29:21  Up to date!
11:29:21
11:29:21  Updates available for packages: ['snowplow/snowplow_utils']
Update your versions in packages.yml, then run dbt deps

Thank you.

Do you have this column in your table? What is the value of snowplow__enable_iab in your dbt_project.yml file?

I think I have, if I get it right:

These were the last lines in my dbt_project.yml:

vars:
  snowplow_web:
    snowplow__start_date: '2022-08-19'
    snowplow__enable_iab: true
    snowplow__enable_ua: true
    snowplow__enable_yauaa: true
    snowplow__events: 'atomic.events'

But later, after I got the mentioned errors, I set all three to false just to see if the transformations will run without them, and they did.

Thank you.

I’m glad you managed to resolve it, but I would recommend not altering the snowplow__events variable as this is used in many places throughout the package. You should use the snowplow__atomic_schema, snowplow__database, and snowplow__events_table variables to alter those details instead if required.

Hi @Ryan .

I’m confused. I don’t think I was able to resolve anything yet. My understanding is that I just skipped the enrichments completely, as I set all three to false, right?

Also, I didn’t get your remark about changing the snowplow__events variable. What do you mean? I just set it a value according to the location I have in BigQuery.

Thank you.

The snowplow__events variable shouldn’t be altered, it’s used internally by the package and altering it directly can lead to unintended outcomes. You should set the location of your events table using the other variables I have suggested.

You can see our variables here: Web | Snowplow Documentation , and these specific ones are highlighted in the quick start Web Quickstart | Snowplow Documentation

Once you have set those it may correctly pick up those columns as it may currently be not reading from the correct table throughout.

Are you saying that I have to set it as in the Accelerator doc, i.e. to atomic.sample_events?

Ah, the accelerator may be a little out of date, I’ll look in to getting that updated. Although as the defaults are atomic and events anyway, that should be fine as is. Are you able to look at atomic_scratch.snowplow_web_base_events_this_run and see if that table has the contexts as well as expected?

I see, thank you.

Here’s what I have:

I see the issue now, thanks. Unfortunately BQ is case sensitive for column names, and we are looking for a lower case column prefix (our normal Snowplow loader loads into lowercase in BigQuery). The easiest way to resolve this for the accelerator data is just to replace the table with a version with lowercase column names, which I have verified then allows the models to run. The query should be:

create or replace table atomic.events_lower as 
select APP_ID as app_id,PLATFORM as platform,ETL_TSTAMP as etl_tstamp,COLLECTOR_TSTAMP as collector_tstamp,DVCE_CREATED_TSTAMP as dvce_created_tstamp,EVENT as event,EVENT_ID as event_id,TXN_ID as txn_id,NAME_TRACKER as name_tracker,V_TRACKER as v_tracker,V_COLLECTOR as v_collector,V_ETL as v_etl,USER_ID as user_id,USER_IPADDRESS as user_ipaddress,USER_FINGERPRINT as user_fingerprint,DOMAIN_USERID as domain_userid,DOMAIN_SESSIONIDX as domain_sessionidx,NETWORK_USERID as network_userid,GEO_COUNTRY as geo_country,GEO_REGION as geo_region,GEO_CITY as geo_city,GEO_ZIPCODE as geo_zipcode,GEO_LATITUDE as geo_latitude,GEO_LONGITUDE as geo_longitude,GEO_REGION_NAME as geo_region_name,IP_ISP as ip_isp,IP_ORGANIZATION as ip_organization,IP_DOMAIN as ip_domain,IP_NETSPEED as ip_netspeed,PAGE_URL as page_url,PAGE_TITLE as page_title,PAGE_REFERRER as page_referrer,PAGE_URLSCHEME as page_urlscheme,PAGE_URLHOST as page_urlhost,PAGE_URLPORT as page_urlport,PAGE_URLPATH as page_urlpath,PAGE_URLQUERY as page_urlquery,PAGE_URLFRAGMENT as page_urlfragment,REFR_URLSCHEME as refr_urlscheme,REFR_URLHOST as refr_urlhost,REFR_URLPORT as refr_urlport,REFR_URLPATH as refr_urlpath,REFR_URLQUERY as refr_urlquery,REFR_URLFRAGMENT as refr_urlfragment,REFR_MEDIUM as refr_medium,REFR_SOURCE as refr_source,REFR_TERM as refr_term,MKT_MEDIUM as mkt_medium,MKT_SOURCE as mkt_source,MKT_TERM as mkt_term,MKT_CONTENT as mkt_content,MKT_CAMPAIGN as mkt_campaign,SE_CATEGORY as se_category,SE_ACTION as se_action,SE_LABEL as se_label,SE_PROPERTY as se_property,SE_VALUE as se_value,TR_ORDERID as tr_orderid,TR_AFFILIATION as tr_affiliation,TR_TOTAL as tr_total,TR_TAX as tr_tax,TR_SHIPPING as tr_shipping,TR_CITY as tr_city,TR_STATE as tr_state,TR_COUNTRY as tr_country,TI_ORDERID as ti_orderid,TI_SKU as ti_sku,TI_NAME as ti_name,TI_CATEGORY as ti_category,TI_PRICE as ti_price,TI_QUANTITY as ti_quantity,PP_XOFFSET_MIN as pp_xoffset_min,PP_XOFFSET_MAX as pp_xoffset_max,PP_YOFFSET_MIN as pp_yoffset_min,PP_YOFFSET_MAX as pp_yoffset_max,USERAGENT as useragent,BR_NAME as br_name,BR_FAMILY as br_family,BR_VERSION as br_version,BR_TYPE as br_type,BR_RENDERENGINE as br_renderengine,BR_LANG as br_lang,BR_FEATURES_PDF as br_features_pdf,BR_FEATURES_FLASH as br_features_flash,BR_FEATURES_JAVA as br_features_java,BR_FEATURES_DIRECTOR as br_features_director,BR_FEATURES_QUICKTIME as br_features_quicktime,BR_FEATURES_REALPLAYER as br_features_realplayer,BR_FEATURES_WINDOWSMEDIA as br_features_windowsmedia,BR_FEATURES_GEARS as br_features_gears,BR_FEATURES_SILVERLIGHT as br_features_silverlight,BR_COOKIES as br_cookies,BR_COLORDEPTH as br_colordepth,BR_VIEWWIDTH as br_viewwidth,BR_VIEWHEIGHT as br_viewheight,OS_NAME as os_name,OS_FAMILY as os_family,OS_MANUFACTURER as os_manufacturer,OS_TIMEZONE as os_timezone,DVCE_TYPE as dvce_type,DVCE_ISMOBILE as dvce_ismobile,DVCE_SCREENWIDTH as dvce_screenwidth,DVCE_SCREENHEIGHT as dvce_screenheight,DOC_CHARSET as doc_charset,DOC_WIDTH as doc_width,DOC_HEIGHT as doc_height,TR_CURRENCY as tr_currency,TR_TOTAL_BASE as tr_total_base,TR_TAX_BASE as tr_tax_base,TR_SHIPPING_BASE as tr_shipping_base,TI_CURRENCY as ti_currency,TI_PRICE_BASE as ti_price_base,BASE_CURRENCY as base_currency,GEO_TIMEZONE as geo_timezone,MKT_CLICKID as mkt_clickid,MKT_NETWORK as mkt_network,ETL_TAGS as etl_tags,DVCE_SENT_TSTAMP as dvce_sent_tstamp,REFR_DOMAIN_USERID as refr_domain_userid,REFR_DVCE_TSTAMP as refr_dvce_tstamp,DOMAIN_SESSIONID as domain_sessionid,DERIVED_TSTAMP as derived_tstamp,EVENT_VENDOR as event_vendor,EVENT_NAME as event_name,EVENT_FORMAT as event_format,EVENT_VERSION as event_version,EVENT_FINGERPRINT as event_fingerprint,TRUE_TSTAMP as true_tstamp,LOAD_TSTAMP as load_tstamp,CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_WEB_PAGE_1_0_0 as contexts_com_snowplowanalytics_snowplow_web_page_1_0_0,CONTEXTS_COM_IAB_SNOWPLOW_SPIDERS_AND_ROBOTS_1_0_0 as contexts_com_iab_snowplow_spiders_and_robots_1_0_0,CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_UA_PARSER_CONTEXT_1_0_0 as contexts_com_snowplowanalytics_snowplow_ua_parser_context_1_0_0,CONTEXTS_NL_BASJES_YAUAA_CONTEXT_1_0_0 as contexts_nl_basjes_yauaa_context_1_0_0
from atomic.events;
drop table atomic.events;
alter table atomic.events_lower rename to events;

You can also remove the snowplow__events from your variables, as this is the default anyway. Finally. you’ll need to drop the existing derived tables because bigquery won’t update the column names in there:

drop schema atomic_derived cascade;
drop schema atomic_scratch cascade;

When you then run it after this is should all work!

Thank you Ryan. I’ll try what you suggested and give my update.

If I’m not mistaken, BQ can be made case insensitive for column names. However, it’s not the case for that table at the moment:
image

Ah, I think my screenshot above is for table names in a dataset, not column names. However, in the BQ docs they say:

Duplicate column names are not allowed even if the case differs. For example, a column named Column1 is considered identical to a column named column1 .

Yeah we pull the column names to coalesce minor and patch versions of our contexts into a single column, so we look for a specific (case sensitive) prefix, which in this case it was not finding unfortunately.