We are using dbt Cloud for hosting our installation and have setup and can run the snowplow_web package successfully. The builds pass and the supporting tables (derived, manifest, scratch) all are created with their subsequent schemas, however none of the tables are populating. The package continues to produce zero rows.
We are running BigQuery.
Here is our project yml entry for the package:
realized I should add more information
Hey @CSlovak, welcome to the Snowplow community!
I think what might be happening is that you didn’t set the
snowplow__start_date value which means the web package is trying to process data from the default starting date (which is
2020-01-01). It also looks like you have the default value for
snowplow__backfill_limit_days which is 30, this therefore means that the web package will be looking in the date range of
2020-01-31 for data to process. Since there is (presumably) no data yet for this date range, the web package creates empty tables but does not update it’s manifest to say that it has processed data in this date range, since no actual data was processed. As a result, on the next run the web package once again searches this date range for data to process, and again finds nothing. To immediately resolve this problem (if I understand it correctly), you’ll need to update your
snowplow__start_date value to something a bit more recent, when you first started generating data in your
If you’re interested, here’s an explanation of how our package works at a high level and why it runs into this problem. We use a series of macros to generate and maintain a
manifest table, which essentially keeps track of each “actual” table that the web package generates (in the
derived schemas) and what the latest timestamp is of data processed for that table. This allows us to very easily “catch-up” in case parts of a dbt run fail during any run, and also ensures that without changing any parameters in the dbt project, the web tables will remain as up to date as possible with every run (assuming you run it frequently enough – which in the default is more frequently than every 30 days). However, when there is no data in the source tables to process, our manifest does not update in order to allow for the data to be loaded into the events table in case it is late-arriving.
I hope this clarifies things and helps resolve your issue, but if it doesn’t or if you have any more questions don’t hesitate to let me know!
Have a great day,