We found something when trying to calculate time spent and found it odd. The following all came about when calculating the temporal length of a
page_id with page_view event and the max
derived_tstamp for all subsequent events within that page_id.
Scenario: A visitor comes to our site for the first time, views 1 page and then start’s their work day, and 3 hrs later at lunch clicks around the existing page, filters or something like that but they don’t refresh the page. What you will get in this scenario is; 2
sessionidx with 1
page_view_id crossing the session and an engaged time of 10,800 seconds as per the Snowplow models.
Why is it that once the
_SP session Id ends the
page_id does not reset until the page is manually refreshed?
Outside of the persistent domain_id cookie. Should the tracker not should dump all current event ids on _sp session end? as no existing event id should exist outside the session.
The issue is that then the
page_id crosses multiple sessions if the page_id is kept, example below, I haven’t cherry picked as its not a small issue.
Due to this engaged time is very much artificially inflated. Then when applying the SP models the roll ups to sessions and users are compounded/multiplied as 1 page crosses many sessions as well as engaged time.
We could spend time rewriting the models and putting rules in place that no page_view can exceed the session of 30mins. That a page_id belongs to the first session it was seen in and not subsequent sessions, but what to do with the following interactions is the issue, do we manually rekey the id, sounds messy.
I feel that the tracker should probably end the
page_id on session end, then if a new session is created on the same page, then the
page event id should change too. I do understand this would create a new page_view event or possibly need to be handled by a new event type, I don’t know, like
page_view resume event.
Any thoughts/suggestions are very much appreciated.