Having some problems running Snowplow Web Model in dbtCloud

Ok, so I’m set up in dbtCloud for data modelling - and have done some very primitive models myself but I wanted to integrate the Snowplow dbt packages because I thought it would just make everything easier.

Here’s my set up; and here’s the problem I’m having. I’m probably missing something, but when the run takes 1h 40m to fail, its really hard to debug efficiently.

So I have a Development environment set up; using dbt v1.0. It uses the development custom schema prefix.

In dbt_project.yml I have

# Configuring packages
# https://docs.snowplowanalytics.com/docs/modeling-your-data/the-snowplow-web-data-model/dbt-web-data-model/
# https://snowplow.github.io/dbt-snowplow-web/#!/overview/snowplow_web
# https://snowplow.github.io/dbt-snowplow-mobile/#!/overview/snowplow_mobile
vars:
  snowplow_web:
    snowplow__enable_iab: false 
    snowplow__enable_ua: true
    snowplow__enable_yauaa: true
    snowplow__start_date: '2021-07-06' # First events in the pipeline
    snowplow__min_visit_length: 10 # 5 is default value (secs)
    snowplow__heartbeat: 10 # Default value (secs)
  snowplow_mobile:
    snowplow__enable_mobile_context: true
    snowplow__enable_geolocation_context: false 
    snowplow__enable_application_context: true
    snowplow__enable_screen_context: true
    snowplow__start_date: '2022-01-12'

In packages.yml I have

packages:
  - package: snowplow/snowplow_utils
    version: 0.6.0
  - package: snowplow/snowplow_web
    version: 0.5.1
  - package: snowplow/snowplow_mobile
    version: 0.2.0

I’ve run dbt deps. Everything is installed fine.

When I do dbt run - it fails. Consistently on this part today (yesterday my dbtCloud IDE session expired before I could grab the error). So I’ve done two runs.

This is from cold start; so it went back to fill in events. From the first time period. It’s only 823 by the looks of it (just the first month of test data); so it’s not like there’s a million events killing it.

In snowplow_web_base_sessions_this_run:

2022-03-03T12:47:35.912779Z: 12:47:35  SQL status: SELECT 823 in 1.27 seconds
2022-03-03T12:47:35.915901Z: 12:47:35  Using postgres connection "model.snowplow_web.snowplow_web_base_sessions_this_run"
2022-03-03T12:47:35.916033Z: 12:47:35  On model.snowplow_web.snowplow_web_base_sessions_this_run: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "model.snowplow_web.snowplow_web_base_sessions_this_run"} */
alter table "snowplow"."development_scratch"."snowplow_web_base_sessions_this_run" rename to "snowplow_web_base_sessions_this_run__dbt_backup"

This is where the run fails. But it doesn’t give me any information as to why!

NOTE: I don’t think this has anything to do with DB permissions, because the mobile equivalent runs and succeeds fine

2022-03-03T12:49:24.393795Z: 12:49:24  SQL status: SELECT 119 in 95.11 seconds
2022-03-03T12:49:24.397172Z: 12:49:24  Using postgres connection "model.snowplow_mobile.snowplow_mobile_base_events_this_run"
2022-03-03T12:49:24.397323Z: 12:49:24  On model.snowplow_mobile.snowplow_mobile_base_events_this_run: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "model.snowplow_mobile.snowplow_mobile_base_events_this_run"} */
alter table "snowplow"."development_scratch"."snowplow_mobile_base_events_this_run" rename to "snowplow_mobile_base_events_this_run__dbt_backup"
2022-03-03T12:49:25.069136Z: 12:49:25  SQL status: ALTER TABLE in 0.67 seconds
2022-03-03T12:49:25.071897Z: 12:49:25  Using postgres connection "model.snowplow_mobile.snowplow_mobile_base_events_this_run"
2022-03-03T12:49:25.072030Z: 12:49:25  On model.snowplow_mobile.snowplow_mobile_base_events_this_run: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "model.snowplow_mobile.snowplow_mobile_base_events_this_run"} */
alter table "snowplow"."development_scratch"."snowplow_mobile_base_events_this_run__dbt_tmp" rename to "snowplow_mobile_base_events_this_run"
2022-03-03T12:49:25.205347Z: 12:49:25  SQL status: ALTER TABLE in 0.13 seconds
2022-03-03T12:49:25.206725Z: 12:49:25  On model.snowplow_mobile.snowplow_mobile_base_events_this_run: COMMIT
2022-03-03T12:49:25.206861Z: 12:49:25  Using postgres connection "model.snowplow_mobile.snowplow_mobile_base_events_this_run"
2022-03-03T12:49:25.206938Z: 12:49:25  On model.snowplow_mobile.snowplow_mobile_base_events_this_run: COMMIT
2022-03-03T12:49:25.407884Z: 12:49:25  SQL status: COMMIT in 0.2 seconds
2022-03-03T12:49:25.410228Z: 12:49:25  Using postgres connection "model.snowplow_mobile.snowplow_mobile_base_events_this_run"
2022-03-03T12:49:25.410361Z: 12:49:25  On model.snowplow_mobile.snowplow_mobile_base_events_this_run: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "model.snowplow_mobile.snowplow_mobile_base_events_this_run"} */
drop table if exists "snowplow"."development_scratch"."snowplow_mobile_base_events_this_run__dbt_backup" cascade
2022-03-03T12:49:26.019316Z: 12:49:26  SQL status: DROP TABLE in 0.61 seconds
2022-03-03T12:49:26.020650Z: 12:49:26  finished collecting timing info
2022-03-03T12:49:26.020830Z: 12:49:26  On model.snowplow_mobile.snowplow_mobile_base_events_this_run: Close
2022-03-03T12:49:26.021525Z: 12:49:26  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '02046cee-86db-4d21-8607-229a840ae7d8', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f22f73640a0>]}
2022-03-03T12:49:26.021893Z: 12:49:26  18 of 48 OK created table model development_scratch.snowplow_mobile_base_events_this_run [SELECT 119e in 97.49s]
2022-03-03T12:49:26.022011Z: 12:49:26  Finished running node model.snowplow_mobile.snowplow_mobile_base_events_this_run

Hey @angelsk,

I’ve had a look at this and can’t seem to find anything wrong. I’ve tried running this same setup (albeit with around 200 events spread across web and mobile instead of 823) using both dbt v1.0.0 and dbt v1.0.3 and both the web and mobile model finish running within about 30 seconds, so I’m not really sure what’s going wrong.

One of the differences between our data setups is that I’m running dbt CLI, versus your dbt Cloud setup but I don’t see why that would result in such a long delay. Is it possible for you to try to run this setup on dbt CLI or is that not possible? After looking around the dbt-core repository for something similar, I’ve also stumbled across this issue (which has been around since dbt v0.19.0) slowing down runs using postgres which is fixed in dbt v1.0.4 which is yet to be released.

Could I ask you to either try running this model using dbt CLI, or instead using dbt v1.0.0 to see if that changes things for the better? Otherwise I’m afraid my next piece of advice would be to wait for dbt v1.0.4, and I’ll see if I can get a dbt Cloud environment set up next week to test things if none of the above options resolve your issue. Sorry about that, and I hope that this helps!

Cheers,
Emiel

2 Likes

Hi @Emiel - sorry, I’ve been off for the last week and haven’t had a chance to catch up on this.

I tried again today. I deleted all the sp_web tables from the development schema, because I couldn’t load the contents of the broken tables - so I figured that was maybe causing the problems. But no joy.

This time it failed on snowplow_web_page_views_this_run - the whole task took nearly 2 hours to run - but all of that (aside from the first minute) was on this task - so there is a problem here. It couldn’t create create table "snowplow"."development_scratch"."snowplow_web_page_views_this_run__dbt_tmp". I think this is probably what happened the first time I ran it - but the IDE timed out before I could see the failure.

It’s not created any broken tables this time around - but I’m not sure how to handle it failing partway through a run… if I run dbt run again, will it break in the previous place?

In answer to your questions:

  • No, I don’t have anything set up on dbt CLI - I’m running this all in the cloud through the development IDE at the moment; this was recommended to me by Snowplow when I was looking for a data modelling solution :slight_smile: - I assumed dbt cloud just uses dbt cli under the hood
  • I am using "dbt_version": "1.0.3" and have been for a while
  • 823 events isn’t a lot - that was just a bunch of internal testing. There are going to be more.

If it’s this postgres macro thing… then I guess I can’t do anything until 1.0.4 is released? But it doesn’t look like the timing out task uses that?

Any ideas?

Hey @angelsk, thanks for getting back to me, and thanks for the additional context.

Since your post I’ve created a dbt Cloud environment to see if I could replicate the issue you’re having but I’m finding myself unable to do so. Before I go into more detail, just to add some more context as to why I asked those questions. I was wondering if you had dbt CLI set up as that was the only difference I saw between my set-up and yours, and if it was already set up I would have asked you to try to run the exact same setup using the CLI to see if that made a difference, although I agree dbt cloud does just use dbt CLI under the hood and therefore should produce identical results.

So as mentioned previously, I’ve set up a dbt Cloud account which, with a comparable number of events and the same setup, runs through all models in about 30 seconds, so I think the problem might be with the postgres instance itself. Could it be a case that one of the tables is being locked by another process? I know you dropped all snowplow_web tables, however I can’t really think of another explanation at the moment. It might be worth dropping the development schemas again, re-running the model and (if it’s taking longer than a couple of minutes), having a look at the pg_locks table to see if any locks are currently in place and what queries are causing those locks. You can read a bit more about lock monitoring in Postgres here. If you have some time, could you see if looking at the locks provides any additional information, and if that doesn’t allow you to solve the problem it would be great if you could let me know what information you have gleamed from looking at the locks table to see if I can help.

Sorry that I don’t have the silver bullet quite yet, but hopefully we can figure this out soon!
Have a great day,
Emiel

3 Likes

I don’t see how the table could be being used by a different process - it’s the development schema and I’m the only cloud user.

But I shall try one more time.

And in true “let’s try and debug it, oh now you’re going to work fine” fashion. I deleted all mobile and web tables in the development_* schemas (didn’t wanna delete the whole schema coz permissions are a pain); then went through locks and active queries and made sure it was all clean.

And then it ran perfectly.

So all I can say is thanks @Emiel - I don’t know why it broke and blocked the first time - but I appreciate your help in debugging and fixing this. Now I’m going to run it several more times to get up to date with the data and see if it get stuck again. :slight_smile:

NOTE: I did upgrade my modules, maybe that had an impact.

packages:
  - package: snowplow/snowplow_utils
    version: 0.8.0
  - package: snowplow/snowplow_web
    version: 0.6.0
  - package: snowplow/snowplow_mobile
    version: 0.3.1

And for anyone else running into issues, along with the lovely wiki link @Emiel provided to see blocking queries; see all active queries in Postgres 9.2+ with

-- active queries
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
1 Like

And I may have spoken too soon. Run into the same behaviour again. But this time I have more information.

For reference: I ran the select query without the table create and it ran fine and quickly - has 58k rows.

I’m on run 4. And we have some “stuck” queries. Nothing is being blocked but these seem stuck for the snowplow_web_pv_yauaa task. This ran perfectly for the first 3 runs which completed in under 2 minutes.

pid,age,usename,query
32647,0 years 0 mons 0 days 0 hours 9 mins 10.843863 secs,dbt_test,"/* 
{""app"": ""dbt"", ""dbt_version"": ""1.0.3"", ""profile_name"": ""user"", ""target_name"": ""default"", ""node_id"": ""model.snowplow_web.snowplow_web_pv_yauaa""} */


  create  table ""snowplow"".""development_scratch"".""snowplow_web_pv_yauaa__dbt_tmp""
  as (
    


select
  pv.page_view_id,

  ya.device_class,
  ya.agent_class,
  ya.agent_name,
  ya.agent_name_version,
  ya.agent_name_version_major,
  ya.agent_version,
  ya.agent_version_major,
  ya.device_brand,
  ya.device_name,
  ya.device_version,
  ya.layout_engine_class,
  ya.layout_engine_name,
  ya.layout_engine_name_version,
  ya.layout_engine_name_version_major,
  ya.layout_engine_version,
  ya.layout_engine_version_major,
  ya.operating_system_class,
  ya.operating_system_name,
  ya.operating_system_name_version,
  ya.operating_system_version

from ""snowplow"".""atomic"".""nl_basjes_yauaa_context_1"" ya

inner join ""snowplow"".""development_scratch"".""snowplow_web_page_view_events"" pv
on ya.root_id = pv.event_id
and ya.root_tstamp = pv.collector_tstamp

where ya.root_tstamp >= (select lower_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  and ya.root_tstamp <= (select upper_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  );"
32648,0 years 0 mons 0 days 0 hours 9 mins 10.843866 secs,dbt_test,"/*
 {""app"": ""dbt"", ""dbt_version"": ""1.0.3"", ""profile_name"": ""user"", ""target_name"": ""default"", ""node_id"": ""model.snowplow_web.snowplow_web_pv_yauaa""} */


  create  table ""snowplow"".""development_scratch"".""snowplow_web_pv_yauaa__dbt_tmp""
  as (
    


select
  pv.page_view_id,

  ya.device_class,
  ya.agent_class,
  ya.agent_name,
  ya.agent_name_version,
  ya.agent_name_version_major,
  ya.agent_version,
  ya.agent_version_major,
  ya.device_brand,
  ya.device_name,
  ya.device_version,
  ya.layout_engine_class,
  ya.layout_engine_name,
  ya.layout_engine_name_version,
  ya.layout_engine_name_version_major,
  ya.layout_engine_version,
  ya.layout_engine_version_major,
  ya.operating_system_class,
  ya.operating_system_name,
  ya.operating_system_name_version,
  ya.operating_system_version

from ""snowplow"".""atomic"".""nl_basjes_yauaa_context_1"" ya

inner join ""snowplow"".""development_scratch"".""snowplow_web_page_view_events"" pv
on ya.root_id = pv.event_id
and ya.root_tstamp = pv.collector_tstamp

where ya.root_tstamp >= (select lower_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  and ya.root_tstamp <= (select upper_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  );"
23293,0 years 0 mons 0 days 0 hours 9 mins 10.84387 secs,dbt_test,"/* 
{""app"": ""dbt"", ""dbt_version"": ""1.0.3"", ""profile_name"": ""user"", ""target_name"": ""default"", ""node_id"": ""model.snowplow_web.snowplow_web_pv_yauaa""} */


  create  table ""snowplow"".""development_scratch"".""snowplow_web_pv_yauaa__dbt_tmp""
  as (
    


select
  pv.page_view_id,

  ya.device_class,
  ya.agent_class,
  ya.agent_name,
  ya.agent_name_version,
  ya.agent_name_version_major,
  ya.agent_version,
  ya.agent_version_major,
  ya.device_brand,
  ya.device_name,
  ya.device_version,
  ya.layout_engine_class,
  ya.layout_engine_name,
  ya.layout_engine_name_version,
  ya.layout_engine_name_version_major,
  ya.layout_engine_version,
  ya.layout_engine_version_major,
  ya.operating_system_class,
  ya.operating_system_name,
  ya.operating_system_name_version,
  ya.operating_system_version

from ""snowplow"".""atomic"".""nl_basjes_yauaa_context_1"" ya

inner join ""snowplow"".""development_scratch"".""snowplow_web_page_view_events"" pv
on ya.root_id = pv.event_id
and ya.root_tstamp = pv.collector_tstamp

where ya.root_tstamp >= (select lower_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  and ya.root_tstamp <= (select upper_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  );"

It seems to be running the same query 3 times.

Oh no, @angelsk! I was about to reply a congratulating reply, but now there’s more digging to be done! I find it super odd that dbt is trying to run the exact same query 3 times, and we can see three distinct process IDs as well. I assume you have multiple threads configured in your profiles.yml? Could you kill the running processes, and then try to reduce the number of threads in profiles.yml for your Postgres profile to 1?
To see all recent queries you’d have to run the following:

SELECT
  pid,
  now() - pg_stat_activity.query_start AS duration,
  query,
  state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '20 minutes';

Then to cancel them you would execute:

SELECT pg_cancel_backend( {pid} );

and if cancelling doesn’t work then you might have to kill them, although that can cause some issues with Postgres whereby it has to restart.

SELECT pg_terminate_backend( {pid} );

After this if you re-run the model with only one thread, can you see if it works then? We would expect an increase in running time but this should prevent multiple of the same queries firing off in parallel, even if they might for some reason run in succession. I think this might be a bug in dbt, as I see nowhere in our logic where we could possibly try to execute the same query three times in parallel? Maybe this was what happened before that caused the tables to be blocked?

Please let me know how it goes with this, and sorry that we still haven’t solved the problem yet!
Emiel

Hi @Emiel - I know right - but this is obviously what happened before - we just didn’t know how to catch it.

I am just using the default dbtCloud profile so going to have to do some digging to figure out how to reduce it to 1 thread from 4.

# This setting configures which "profile" dbt uses for this project.
profile: 'default'

Am I safe to re-run the run; because it obviously got halfway through a batch. Would that result in missing records, or is the model really clever and knows that something went wrong?

Also just to note that Production also uses the same profile obviously because it’s in the project config. So that also runs 4 threads. Though that is configurable per job; so I’ve set my Data freshness check to 1 thread to see if that stops the timeouts I’ve been having.

ETA: So it’s configurable in the developer settings: Setting a custom target name | dbt Docs

ETA2: It does not resume nicely sadly. So I’ve had to nuke all the web_* tables in the 3 schemas. Going to start again with a single thread and see how many runs it takes before I break it again :wink:

ETA3: 4 → 1 threads adds an extra minute and a half to the run length, which is probably also coz its building the entire history :slight_smile:

So @Emiel - it’s working fine with one thread. Obviously quite slow, as its now starting to process chunks of 300k events :slight_smile: (5+ minutes per go).

Successful run 4 with lots of events 17:43:29 14 of 49 OK created table model development_scratch.snowplow_web_base_events_this_run [SELECT 213386e in 35.48s]

Ideally, would like to be able to run these multi-threaded, especially in production.

And we hit a failure :frowning:

Run 5 has failed… 12 minutes so its not the same problem as before. But this is what happened when I tried to resume after cancelling the 3 threads. It’s failing on snowplow_web_page_views but I don’t have any errors visible. This is the log for the last job as it failed. Also, checked the DB and no queries or locks hanging around.

For clarity
2022-03-16T17:48:35.220402Z: 17:48:35 14 of 49 OK created table model development_scratch.snowplow_web_base_events_this_run [SELECT 283320 in 47.86s]

2022-03-16T17:49:50.448415Z: 17:49:50  Began running node model.snowplow_web.snowplow_web_page_views
2022-03-16T17:49:50.448613Z: 17:49:50  33 of 49 START snowplow_incremental model development_derived.snowplow_web_page_views [RUN]
2022-03-16T17:49:50.448845Z: 17:49:50  Acquiring new postgres connection "model.snowplow_web.snowplow_web_page_views"
2022-03-16T17:49:50.448937Z: 17:49:50  Began compiling node model.snowplow_web.snowplow_web_page_views
2022-03-16T17:49:50.449023Z: 17:49:50  Compiling model.snowplow_web.snowplow_web_page_views
2022-03-16T17:49:50.460636Z: 17:49:50  Using postgres connection "model.snowplow_web.snowplow_web_page_views"
2022-03-16T17:49:50.460766Z: 17:49:50  On model.snowplow_web.snowplow_web_page_views: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "dev", "node_id": "model.snowplow_web.snowplow_web_page_views"} */

    
        select 
          case when 
            (select upper_limit from "snowplow"."development_scratch"."snowplow_web_base_new_event_limits") 
            <= (select last_success from "snowplow"."development_snowplow_manifest"."snowplow_web_incremental_manifest" where model = 'snowplow_web_page_views') 
          then false 
        else true end
      
  
2022-03-16T17:49:50.460854Z: 17:49:50  Opening a new connection, currently in state closed
2022-03-16T17:49:51.076675Z: 17:49:51  SQL status: SELECT 1 in 0.62 seconds
2022-03-16T17:49:51.078045Z: 17:49:51  Writing injected SQL for node "model.snowplow_web.snowplow_web_page_views"
2022-03-16T17:49:51.091461Z: 17:49:51  finished collecting timing info
2022-03-16T17:49:51.091624Z: 17:49:51  Began executing node model.snowplow_web.snowplow_web_page_views
2022-03-16T17:49:51.095109Z: 17:49:51  Using postgres connection "model.snowplow_web.snowplow_web_page_views"
2022-03-16T17:49:51.095213Z: 17:49:51  On model.snowplow_web.snowplow_web_page_views: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "dev", "node_id": "model.snowplow_web.snowplow_web_page_views"} */

    

  create temporary table "snowplow_web_page_views__dbt_tmp174951094657"
  as (
    


select * 
from "snowplow"."development_scratch"."snowplow_web_page_views_this_run"
where cast(True as boolean) --returns false if run doesn't contain new events.
  );
  
2022-03-16T17:49:51.792497Z: 17:49:51  SQL status: SELECT 79015 in 0.7 seconds
2022-03-16T17:49:51.795714Z: 17:49:51  Using postgres connection "model.snowplow_web.snowplow_web_page_views"
2022-03-16T17:49:51.795839Z: 17:49:51  On model.snowplow_web.snowplow_web_page_views: BEGIN
2022-03-16T17:49:51.872034Z: 17:49:51  SQL status: BEGIN in 0.08 seconds
2022-03-16T17:49:51.872201Z: 17:49:51  Using postgres connection "model.snowplow_web.snowplow_web_page_views"
2022-03-16T17:49:51.872283Z: 17:49:51  On model.snowplow_web.snowplow_web_page_views: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "dev", "node_id": "model.snowplow_web.snowplow_web_page_views"} */

      select
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

      from INFORMATION_SCHEMA.columns
      where table_name = 'snowplow_web_page_views__dbt_tmp174951094657'
        
      order by ordinal_position

  
2022-03-16T17:49:51.963152Z: 17:49:51  SQL status: SELECT 99 in 0.09 seconds
2022-03-16T17:49:51.968499Z: 17:49:51  Using postgres connection "model.snowplow_web.snowplow_web_page_views"
2022-03-16T17:49:51.968628Z: 17:49:51  On model.snowplow_web.snowplow_web_page_views: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "dev", "node_id": "model.snowplow_web.snowplow_web_page_views"} */

      select
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

      from "snowplow".INFORMATION_SCHEMA.columns
      where table_name = 'snowplow_web_page_views'
        
        and table_schema = 'development_derived'
        
      order by ordinal_position

  
2022-03-16T17:49:52.054703Z: 17:49:52  SQL status: SELECT 99 in 0.09 seconds
2022-03-16T17:49:52.060093Z: 17:49:52  Using postgres connection "model.snowplow_web.snowplow_web_page_views"
2022-03-16T17:49:52.060214Z: 17:49:52  On model.snowplow_web.snowplow_web_page_views: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "dev", "node_id": "model.snowplow_web.snowplow_web_page_views"} */

      select
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

      from "snowplow".INFORMATION_SCHEMA.columns
      where table_name = 'snowplow_web_page_views'
        
        and table_schema = 'development_derived'
        
      order by ordinal_position

  
2022-03-16T17:49:52.143510Z: 17:49:52  SQL status: SELECT 99 in 0.08 seconds
2022-03-16T17:49:52.147737Z: 17:49:52  Writing runtime SQL for node "model.snowplow_web.snowplow_web_page_views"
2022-03-16T17:49:52.174755Z: 17:49:52  Using postgres connection "model.snowplow_web.snowplow_web_page_views"
2022-03-16T17:49:52.174891Z: 17:49:52  On model.snowplow_web.snowplow_web_page_views: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "dev", "node_id": "model.snowplow_web.snowplow_web_page_views"} */

      
    
  -- define upsert limits
  
      with vars as (
        select 
              

    min(start_tstamp) + ((interval '1 day') * (-30))

 as lower_limit,
                   max(start_tstamp) as upper_limit
        from "snowplow_web_page_views__dbt_tmp174951094657"
      )
    

  -- run the delete+insert statement
  delete from "snowplow"."development_derived"."snowplow_web_page_views"
    where (page_view_id) in (
        select (page_view_id)
        from "snowplow_web_page_views__dbt_tmp174951094657"
    )
     and start_tstamp between (select lower_limit from vars) and (select upper_limit from vars) ;

    insert into "snowplow"."development_derived"."snowplow_web_page_views" ("page_view_id", "event_id", "app_id", "user_id", "domain_userid", "network_userid", "domain_sessionid", "domain_sessionidx", "page_view_in_session_index", "page_views_in_session", "dvce_created_tstamp", "collector_tstamp", "derived_tstamp", "start_tstamp", "end_tstamp", "model_tstamp", "engaged_time_in_s", "absolute_time_in_s", "horizontal_pixels_scrolled", "vertical_pixels_scrolled", "horizontal_percentage_scrolled", "vertical_percentage_scrolled", "doc_width", "doc_height", "page_title", "page_url", "page_urlscheme", "page_urlhost", "page_urlpath", "page_urlquery", "page_urlfragment", "mkt_medium", "mkt_source", "mkt_term", "mkt_content", "mkt_campaign", "mkt_clickid", "mkt_network", "page_referrer", "refr_urlscheme", "refr_urlhost", "refr_urlpath", "refr_urlquery", "refr_urlfragment", "refr_medium", "refr_source", "refr_term", "geo_country", "geo_region", "geo_region_name", "geo_city", "geo_zipcode", "geo_latitude", "geo_longitude", "geo_timezone", "user_ipaddress", "useragent", "br_lang", "br_viewwidth", "br_viewheight", "br_colordepth", "br_renderengine", "os_timezone", "category", "primary_impact", "reason", "spider_or_robot", "useragent_family", "useragent_major", "useragent_minor", "useragent_patch", "useragent_version", "os_family", "os_major", "os_minor", "os_patch", "os_patch_minor", "os_version", "device_family", "device_class", "agent_class", "agent_name", "agent_name_version", "agent_name_version_major", "agent_version", "agent_version_major", "device_brand", "device_name", "device_version", "layout_engine_class", "layout_engine_name", "layout_engine_name_version", "layout_engine_name_version_major", "layout_engine_version", "layout_engine_version_major", "operating_system_class", "operating_system_name", "operating_system_name_version", "operating_system_version")
    (
        select "page_view_id", "event_id", "app_id", "user_id", "domain_userid", "network_userid", "domain_sessionid", "domain_sessionidx", "page_view_in_session_index", "page_views_in_session", "dvce_created_tstamp", "collector_tstamp", "derived_tstamp", "start_tstamp", "end_tstamp", "model_tstamp", "engaged_time_in_s", "absolute_time_in_s", "horizontal_pixels_scrolled", "vertical_pixels_scrolled", "horizontal_percentage_scrolled", "vertical_percentage_scrolled", "doc_width", "doc_height", "page_title", "page_url", "page_urlscheme", "page_urlhost", "page_urlpath", "page_urlquery", "page_urlfragment", "mkt_medium", "mkt_source", "mkt_term", "mkt_content", "mkt_campaign", "mkt_clickid", "mkt_network", "page_referrer", "refr_urlscheme", "refr_urlhost", "refr_urlpath", "refr_urlquery", "refr_urlfragment", "refr_medium", "refr_source", "refr_term", "geo_country", "geo_region", "geo_region_name", "geo_city", "geo_zipcode", "geo_latitude", "geo_longitude", "geo_timezone", "user_ipaddress", "useragent", "br_lang", "br_viewwidth", "br_viewheight", "br_colordepth", "br_renderengine", "os_timezone", "category", "primary_impact", "reason", "spider_or_robot", "useragent_family", "useragent_major", "useragent_minor", "useragent_patch", "useragent_version", "os_family", "os_major", "os_minor", "os_patch", "os_patch_minor", "os_version", "device_family", "device_class", "agent_class", "agent_name", "agent_name_version", "agent_name_version_major", "agent_version", "agent_version_major", "device_brand", "device_name", "device_version", "layout_engine_class", "layout_engine_name", "layout_engine_name_version", "layout_engine_name_version_major", "layout_engine_version", "layout_engine_version_major", "operating_system_class", "operating_system_name", "operating_system_name_version", "operating_system_version"
        from "snowplow_web_page_views__dbt_tmp174951094657"
    );

I did a subsequent run to see what would happen. And it basically skipped to the next batch. So it didn’t finish the “broken” run.

Hey @angelsk, looks like there are loads of updates to parse through here!

So first of all, I’m glad to see that working with one thread had initially solved the problem. I found this issue in the dbt GitHub as well which seems to be related, although the error messages you’re getting are different and I don’t think it’s exactly the same.

The dbt package doesn’t update the manifest table until the end when there is a run-results object, which the on-run-end hook then uses when calling the snowplow_incremental_post_hook macro to update the manifest table, so if a run is ever disrupted it should be fine to just restart that run at any time (after cancelling any hanging queries) and the model will just restart from where the latest run succeeded, thereby parsing all the data from the failed run again. It’s very interesting that that isn’t happening in your case, can you have a look at the snowplow_{package}_incremental_manifest table after a failed run and share the results with me? It should still say the timestamps of the latest successful run, and I don’t see any way in the code how that wouldn’t be the case.

As for the last issue, did dbt Cloud offer any error message at all, as I’m not seeing any reason why there would be any errors in the logs you sent. One curious thing that I can see is that the following SQL statement seems to be run twice:

      select
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

      from "snowplow".INFORMATION_SCHEMA.columns
      where table_name = 'snowplow_web_page_views'
        
        and table_schema = 'development_derived'
        
      order by ordinal_position

This again seems like a bug, and is similar behaviour to what we were seeing last time with source freshness being checked thrice, and multiple of the same queries running either in parallel or in sequence. Would you mind reaching out to dbt support to see if they can help you with that? There is nothing in our package to suggest that this should be happening, and I fail to see how we could make dbt do that even if we wanted to. Could you also potentially enable pg_stat_statements as shown here and share some of that data with me to see which statements are being run multiple times in sequence and how many times they are being run?

Sorry for asking so much of you and still not providing a solid answer, this is proving to be quite a tricky problem!
Let me know when you’ve found the time to go through all of this, have a great day!
Emiel

Hi @Emiel - new week, fresh (tired) eyes on this.

  1. I’ve put in a ticket with dbt.

  2. We’re using AWS and this is the production DB, so I’m wary about enabling pg_stat_statements. Mainly a confidence thing.

  3. I have done a new refreshed run with just web. And have run the incremental job several times. These are the results. (dbt run --select snowplow_web)

run 1-4: no problems, 1-3 minutes per run
run 5: no problems, but took 18 minutes
run 6: failed in the snowplow_web_pv_yauaa job this time, no “error” in the log, just “failed” - only one statement. 12 passed, 12 queued, 0 failed. But run marked as FAILED after 5-6 minutes.

This is run 6:
12:16:36 + Snowplow: Processing data between '2021-12-02 13:14:41' and '2022-01-01 19:14:41'

12:16:27  Found 49 models, 283 tests, 0 snapshots, 0 analyses, 495 macros, 4 operations, 0 seed files, 16 sources, 0 exposures, 0 metrics
12:16:33  Running 2 on-run-start hooks
12:16:33  Concurrency: 1 threads (target='dev')

In the DB, there are 3 (THREE AGAIN?!) processes running for the yauaa thing - still running even though job marked as failed. And despite this being only one thread in dev.

So I think what it’s doing is rather than running for 2+ hours and then timing out because it’s blocked itself (though there’s nothing in blocking thread in pgsql) - it’s being sensible and going I don’t have a response and FAILING the job. But the logs don’t show this.

pid,age,usename,query
24250,0 years 0 mons 0 days 0 hours 26 mins 3.236494 secs,dbt_test,"/* {""app"": ""dbt"", ""dbt_version"": ""1.0.4"", ""profile_name"": ""user"", ""target_name"": ""dev"", ""node_id"": ""model.snowplow_web.snowplow_web_pv_yauaa""} */


  create  table ""snowplow"".""development_scratch"".""snowplow_web_pv_yauaa__dbt_tmp""
  as (
    


select
  pv.page_view_id,

  ya.device_class,
  ya.agent_class,
  ya.agent_name,
  ya.agent_name_version,
  ya.agent_name_version_major,
  ya.agent_version,
  ya.agent_version_major,
  ya.device_brand,
  ya.device_name,
  ya.device_version,
  ya.layout_engine_class,
  ya.layout_engine_name,
  ya.layout_engine_name_version,
  ya.layout_engine_name_version_major,
  ya.layout_engine_version,
  ya.layout_engine_version_major,
  ya.operating_system_class,
  ya.operating_system_name,
  ya.operating_system_name_version,
  ya.operating_system_version

from ""snowplow"".""atomic"".""nl_basjes_yauaa_context_1"" ya

inner join ""snowplow"".""development_scratch"".""snowplow_web_page_view_events"" pv
on ya.root_id = pv.event_id
and ya.root_tstamp = pv.collector_tstamp

where ya.root_tstamp >= (select lower_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  and ya.root_tstamp <= (select upper_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  );"
19447,0 years 0 mons 0 days 0 hours 26 mins 3.236495 secs,dbt_test,"/* {""app"": ""dbt"", ""dbt_version"": ""1.0.4"", ""profile_name"": ""user"", ""target_name"": ""dev"", ""node_id"": ""model.snowplow_web.snowplow_web_pv_yauaa""} */


  create  table ""snowplow"".""development_scratch"".""snowplow_web_pv_yauaa__dbt_tmp""
  as (
    


select
  pv.page_view_id,

  ya.device_class,
  ya.agent_class,
  ya.agent_name,
  ya.agent_name_version,
  ya.agent_name_version_major,
  ya.agent_version,
  ya.agent_version_major,
  ya.device_brand,
  ya.device_name,
  ya.device_version,
  ya.layout_engine_class,
  ya.layout_engine_name,
  ya.layout_engine_name_version,
  ya.layout_engine_name_version_major,
  ya.layout_engine_version,
  ya.layout_engine_version_major,
  ya.operating_system_class,
  ya.operating_system_name,
  ya.operating_system_name_version,
  ya.operating_system_version

from ""snowplow"".""atomic"".""nl_basjes_yauaa_context_1"" ya

inner join ""snowplow"".""development_scratch"".""snowplow_web_page_view_events"" pv
on ya.root_id = pv.event_id
and ya.root_tstamp = pv.collector_tstamp

where ya.root_tstamp >= (select lower_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  and ya.root_tstamp <= (select upper_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  );"
19448,0 years 0 mons 0 days 0 hours 26 mins 3.236498 secs,dbt_test,"/* {""app"": ""dbt"", ""dbt_version"": ""1.0.4"", ""profile_name"": ""user"", ""target_name"": ""dev"", ""node_id"": ""model.snowplow_web.snowplow_web_pv_yauaa""} */


  create  table ""snowplow"".""development_scratch"".""snowplow_web_pv_yauaa__dbt_tmp""
  as (
    


select
  pv.page_view_id,

  ya.device_class,
  ya.agent_class,
  ya.agent_name,
  ya.agent_name_version,
  ya.agent_name_version_major,
  ya.agent_version,
  ya.agent_version_major,
  ya.device_brand,
  ya.device_name,
  ya.device_version,
  ya.layout_engine_class,
  ya.layout_engine_name,
  ya.layout_engine_name_version,
  ya.layout_engine_name_version_major,
  ya.layout_engine_version,
  ya.layout_engine_version_major,
  ya.operating_system_class,
  ya.operating_system_name,
  ya.operating_system_name_version,
  ya.operating_system_version

from ""snowplow"".""atomic"".""nl_basjes_yauaa_context_1"" ya

inner join ""snowplow"".""development_scratch"".""snowplow_web_page_view_events"" pv
on ya.root_id = pv.event_id
and ya.root_tstamp = pv.collector_tstamp

where ya.root_tstamp >= (select lower_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  and ya.root_tstamp <= (select upper_limit from ""snowplow"".""development_scratch"".""snowplow_web_pv_limits"")
  );"

Contents of the snowplow_web_incremental_manifest_table

model,last_success
snowplow_web_pv_ua_parser,2021-12-02 19:14:41.872000
snowplow_web_users_lasts,2021-12-02 19:14:41.872000
snowplow_web_user_mapping,2021-12-02 19:14:41.872000
snowplow_web_users_this_run,2021-12-02 19:14:41.872000
snowplow_web_sessions_aggs,2021-12-02 19:14:41.872000
snowplow_web_sessions_lasts,2021-12-02 19:14:41.872000
snowplow_web_users_sessions_this_run,2021-12-02 19:14:41.872000
snowplow_web_pv_yauaa,2021-12-02 19:14:41.872000
snowplow_web_pv_limits,2021-12-02 19:14:41.872000
snowplow_web_sessions_this_run,2021-12-02 19:14:41.872000
snowplow_web_sessions,2021-12-02 19:14:41.872000
snowplow_web_page_view_events,2021-12-02 19:14:41.872000
snowplow_web_page_views_this_run,2021-12-02 19:14:41.872000
snowplow_web_pv_engaged_time,2021-12-02 19:14:41.872000
snowplow_web_page_views,2021-12-02 19:14:41.872000
snowplow_web_users,2021-12-02 19:14:41.872000
snowplow_web_pv_scroll_depth,2021-12-02 19:14:41.872000
snowplow_web_users_aggs,2021-12-02 19:14:41.872000

Going to kill those 3 processes and re-run the snowplow web run.

Ok… that’s promising, run 7, starting with the old events again 12:50:32 + Snowplow: Processing data between '2021-12-02 13:14:41' and '2022-01-01 19:14:35' - let’s see if this works?

ETA: That run succeeded in 6 minutes. Will do more runs and see if there’s more data.

Seem to be having fewer issues this time. Wondering if its coz I’m just running web models - not mobile and my models too? Less chances to get out of order?

ETA2: Run 10; its stuck in yauaa again. Still going an hour later. This one didn’t fail. But I see the same 3 copies of the query in pgsql. Going to cancel them and try the run again.

Run 10 14:03:17 + Snowplow: Processing data between '2022-03-02 13:14:08' and '2022-03-21 14:03:16' - STUCK. Cancelled in DB. Caused actual failure in dbt cloud.

2022-03-21T15:06:49.736336Z: 15:06:49  Postgres adapter: Postgres error: canceling statement due to user request

2022-03-21T15:06:49.736577Z: 15:06:49  On model.snowplow_web.snowplow_web_pv_yauaa: ROLLBACK
2022-03-21T15:06:49.823024Z: 15:06:49  finished collecting timing info
2022-03-21T15:06:49.823262Z: 15:06:49  On model.snowplow_web.snowplow_web_pv_yauaa: Close
2022-03-21T15:06:49.823904Z: 15:06:49  Database Error in model snowplow_web_pv_yauaa (models/page_views/scratch/default/snowplow_web_pv_yauaa.sql)
  canceling statement due to user request
  compiled SQL at target/run/snowplow_web/models/page_views/scratch/default/snowplow_web_pv_yauaa.sql
2022-03-21T15:06:49.824189Z: 15:06:49  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'd8aaa085-56a2-416e-bb3f-30b5cd68be72', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f0cd18e2ac0>]}
2022-03-21T15:06:49.824936Z: 15:06:49  13 of 24 ERROR creating table model development_scratch.snowplow_web_pv_yauaa... [ERRORe in 3760.72s]
2022-03-21T15:06:49.825069Z: 15:06:49  Finished running node model.snowplow_web.snowplow_web_pv_yauaa
Database Error in model snowplow_web_pv_yauaa (models/page_views/scratch/default/snowplow_web_pv_yauaa.sql)
  canceling statement due to user request
  compiled SQL at target/run/snowplow_web/models/page_views/scratch/default/snowplow_web_pv_yauaa.sql

And this is the state of the manifest table

model,last_success
snowplow_web_users_lasts,2022-03-02 19:14:08.991000
snowplow_web_users_this_run,2022-03-02 19:14:08.991000
snowplow_web_sessions_aggs,2022-03-02 19:14:08.991000
snowplow_web_sessions_lasts,2022-03-02 19:14:08.991000
snowplow_web_users_sessions_this_run,2022-03-02 19:14:08.991000
snowplow_web_pv_yauaa,2022-03-02 19:14:08.991000
snowplow_web_sessions_this_run,2022-03-02 19:14:08.991000
snowplow_web_sessions,2022-03-02 19:14:08.991000
snowplow_web_page_views_this_run,2022-03-02 19:14:08.991000
snowplow_web_page_views,2022-03-02 19:14:08.991000
snowplow_web_users,2022-03-02 19:14:08.991000
snowplow_web_users_aggs,2022-03-02 19:14:08.991000
snowplow_web_pv_ua_parser,2022-03-21 14:03:16.214000
snowplow_web_user_mapping,2022-03-21 14:03:16.214000
snowplow_web_pv_limits,2022-03-21 14:03:16.214000
snowplow_web_page_view_events,2022-03-21 14:03:16.214000
snowplow_web_pv_engaged_time,2022-03-21 14:03:16.214000
snowplow_web_pv_scroll_depth,2022-03-21 14:03:16.214000

Which is going to mean that run 11 will skip stuff.

BECAUSE… it ran the run on end job this time. That’s the difference between the ones that just timed out and failed in DBT - and the ones that I have to cancel and then they actually fail the step, which then runs the on end stuff.

Any updates on this @Emiel ? DBT support are investigating the query issue; but I’m curious if the last part of my post is intended behaviour? So run on end triggering if there’s a failure of one of the other jobs; and thus then missing a chunk of incremental data?

Hey @angelsk, sorry for the delayed response. I was off sick when you first posted your reply and forget to check back in on this since then. I appreciate this is super frustrating, and I’m really sorry about that.

Hopefully dbt support get to the bottom of the query issue, I suspect that would definitely alleviate pretty much all of the issues we’re seeing here. On the question of whether or not this is intended behaviour, it’s a bit of a mixed response. We do expect the on-run-end macro to run even when a particular model in a run fails, however we check the status of each model in the run_results.json object and then we should update the manifest table only when that model run is successful. The run_results.json object can be found in the target/ folder in the main project directory.

When a dbt run is aborted from the user side, we see that on-run-end does not trigger at all, which is to be expected. What you’ve seen is when dbt times out a run, it correctly identifies it as failing. However, when a user cancels the query while it’s running in the database, I suspect that dbt is classifying these manually failed events as succeeded but at the moment I’m unable to replicate this in Redshift as I don’t have permissions to kill queries in the environment. I will spin up a Postgres instance tomorrow morning and test this hypothesis, but you can also check this when looking at your run_results.json object. If this is the case that the model is classified as successful, then this would indicate another bug on dbt’s side. If however it shows as having a status of anything other than success, then the problem is on our side. It would be great if you could check your run_results.json object as well in the meantime to see what it says, but I will also provide an update of our testing tomorrow as soon as I have it. Sorry for the delay and for the incomplete response, I figured an earlier response would be appreciated over you having to wait until tomorrow!

Sorry again and thanks for bearing with me,
Emiel

Sorry to hear that, hope you’re feeling better <3

Handily I took a screengrab @Emiel - you can see dbt reported it as a failure, skipped the rest, but ran the on end thing. Then the DB looked like I posted at the end of the last post.


I didn’t know about run_results.json - I’ll see if its still there… but assuming its just a JSON rep of this visual rep.

Ok - so I don’t have run results from that run. But I have it for the run that left the manifest table looking like this (when I was testing stuff for dbt) - which is the same situation.

model,last_success
snowplow_web_users_lasts,2021-12-02 19:14:41.872000
snowplow_web_users_this_run,2021-12-02 19:14:41.872000
snowplow_web_sessions_aggs,2021-12-02 19:14:41.872000
snowplow_web_sessions_lasts,2021-12-02 19:14:41.872000
snowplow_web_users_sessions_this_run,2021-12-02 19:14:41.872000
snowplow_web_pv_yauaa,2021-12-02 19:14:41.872000
snowplow_web_sessions_this_run,2021-12-02 19:14:41.872000
snowplow_web_sessions,2021-12-02 19:14:41.872000
snowplow_web_page_views_this_run,2021-12-02 19:14:41.872000
snowplow_web_page_views,2021-12-02 19:14:41.872000
snowplow_web_users,2021-12-02 19:14:41.872000
snowplow_web_users_aggs,2021-12-02 19:14:41.872000
snowplow_web_pv_ua_parser,2022-01-01 19:14:35.134000
snowplow_web_user_mapping,2022-01-01 19:14:35.134000
snowplow_web_pv_limits,2022-01-01 19:14:35.134000
snowplow_web_page_view_events,2022-01-01 19:14:35.134000
snowplow_web_pv_engaged_time,2022-01-01 19:14:35.134000
snowplow_web_pv_scroll_depth,2022-01-01 19:14:35.134000

Couldn’t upload - so stuck it in Google Drive: Google Drive: Sign-in (you’ll need to request access, didn’t want logs to be public)

Hey @angelsk,

Somehow after investigating and trying to replicate what you’ve seen I once again find myself stumped. I’ve dug into my own run results running dbt against a Postgres instance on both my local machine and on dbt cloud and have found that when killing queries on the database side, dbt correctly identifies them as failed (both in the UI as you’ve showed and in the run_results.json object, and I’ve found that the manifest macro updates the manifest correctly as a result. Therefore, I’ve been unable to replicate the behaviour of having the manifest update with the latest timestamp for processed events for models that have failed.

However when looking back at the last manifest table posted on the 13th post (which I’ve quoted below), we can see that the last_success for the _pv_yauaa model is before the last success of the _pv_scroll_depth table and some others, so I think it might actually be working correctly.

model,last_success
snowplow_web_users_lasts,2022-03-02 19:14:08.991000
snowplow_web_users_this_run,2022-03-02 19:14:08.991000
snowplow_web_sessions_aggs,2022-03-02 19:14:08.991000
snowplow_web_sessions_lasts,2022-03-02 19:14:08.991000
snowplow_web_users_sessions_this_run,2022-03-02 19:14:08.991000
snowplow_web_pv_yauaa,2022-03-02 19:14:08.991000
snowplow_web_sessions_this_run,2022-03-02 19:14:08.991000
snowplow_web_sessions,2022-03-02 19:14:08.991000
snowplow_web_page_views_this_run,2022-03-02 19:14:08.991000
snowplow_web_page_views,2022-03-02 19:14:08.991000
snowplow_web_users,2022-03-02 19:14:08.991000
snowplow_web_users_aggs,2022-03-02 19:14:08.991000
snowplow_web_pv_ua_parser,2022-03-21 14:03:16.214000
snowplow_web_user_mapping,2022-03-21 14:03:16.214000
snowplow_web_pv_limits,2022-03-21 14:03:16.214000
snowplow_web_page_view_events,2022-03-21 14:03:16.214000
snowplow_web_pv_engaged_time,2022-03-21 14:03:16.214000
snowplow_web_pv_scroll_depth,2022-03-21 14:03:16.214000

I also see the same behaviour in my own testing, and we see the same behaviour in your latest shared manifest table:

model,last_success
snowplow_web_users_lasts,2021-12-02 19:14:41.872000
snowplow_web_users_this_run,2021-12-02 19:14:41.872000
snowplow_web_sessions_aggs,2021-12-02 19:14:41.872000
snowplow_web_sessions_lasts,2021-12-02 19:14:41.872000
snowplow_web_users_sessions_this_run,2021-12-02 19:14:41.872000
snowplow_web_pv_yauaa,2021-12-02 19:14:41.872000
snowplow_web_sessions_this_run,2021-12-02 19:14:41.872000
snowplow_web_sessions,2021-12-02 19:14:41.872000
snowplow_web_page_views_this_run,2021-12-02 19:14:41.872000
snowplow_web_page_views,2021-12-02 19:14:41.872000
snowplow_web_users,2021-12-02 19:14:41.872000
snowplow_web_users_aggs,2021-12-02 19:14:41.872000
snowplow_web_pv_ua_parser,2022-01-01 19:14:35.134000
snowplow_web_user_mapping,2022-01-01 19:14:35.134000
snowplow_web_pv_limits,2022-01-01 19:14:35.134000
snowplow_web_page_view_events,2022-01-01 19:14:35.134000
snowplow_web_pv_engaged_time,2022-01-01 19:14:35.134000
snowplow_web_pv_scroll_depth,2022-01-01 19:14:35.134000

In both of the cases above, on a subsequent run of the web model what will happen is the lower limit will be defined as the minimum of these last_success values in the manifest table, and so for the posts that would be 2022-03-02 19:14:08.991000 and 2021-12-02 19:14:41.872000 respectively. Therefore on the next run it should try to catch up for all the events that were not fully processed in any model in the previous run.

In the run_results.json you shared via GDrive (thanks for that btw!) I also see that every model status is “success”, so the manifest table should be updated accordingly. Are you seeing that that is the case? Or was this run_results shared from a run where some models failed?

@Emiel honestly I have no clue. I’ve run things so many times. Am assuming that because the manifest table is in a wonky state that there was a failure on that run - so if they’re all successes then there’s a bug or something. I am so confused why it works for everyone else but me; I really want to be able to leverage the snowplow models and its so frustrating that its not working.

What I was seeing is that when the table was half and half like that, it jumped a batch when I started again.
I can do more runs, but at this point I’m kinda tired of it all :frowning:

@Emiel I guess we’re out of ideas. Not really sure what I should do now?