Is there a way to join page ping events to page views?


#1

In this thread @alex mentioned that page ping events are essentially children of page view events.

Is there a way (in redshift) to join the page ping events back to the “parent” page view event? If for instance we wanted to do things like

  • look up attributes of the original page view event or
  • calculate a standard deviation of number of page pings per view (or scroll depth per view)
  • determine what percent of pageviews scroll to a certain depth on the page

same question for link_click events, is there a way to join that back to the parent page view event for answering questions like

  • is there a correlation between time on page (or scroll depth) and clicks on content recommendation links
  • looking for an “effective click through rate” where we look at the number of page views that received a recommendation click as a proportion of page views that scrolled deep enough to see the links

I feel like there is probably an obvious answer to this and I’m just not seeing it. looks like the root_id in the com_snowplowanalytics_snowplow_link_click_1 table joins back to the events table to an event of type unstruct but I don’t see any fields that allow me to join either the unstruct click event or the page ping events back to the original page view.

If there is not a way to do this using the out-of-the-box redshift table schema, is there a way for me to pass the page_view id in a custom context of the page ping and the click event?


#2

Hi @ryanrozich,

That is indeed possible. If the web page context is enabled in the Javascript tracker, a unique ID is generated with each page view and stored in the web page context: atomic.com_snowplowanalytics_snowplow_web_page_1.

All events (page views, page pings, link clicks, structured or unstrucutred events) that happen on the same “page view” will have the same page view ID.

You can expect recipes that show how to use the page view ID to, for example, calculate scroll depth later this week.

Christophe


#3

Thanks! turns out we did not have the web page context enabled in the tracker. We will get that deployed.

Appreciate the help! We’ve only been working with snowplow for a week or so and have found it incredibly useful


#4

Thanks @ryanrozich - that’s good to hear :slight_smile:


#5

Hey @ryanrozich

Just a little heads up in case you have a Single Page Application. Currently webPage context is regenerating only on page reloads, so if you have SPA which loads only once, but have different pages - they all will have same page view ID. This behavior will be changed in 2.7.0 which should be released very soon.


#6

Thanks for the heads up. In almost all cases today our tracking script is deployed on content-rich sites that don’t function as single page apps but each ‘view’ is actually a new page load, but we may see some client sites that operate in this way in the future.


#7

Hi @christophe

So yesterday we added the web page context and now I am seeing events show up in the atomic.com_snowplowanalytics_snowplow_web_page_1 table as shown below

However, I’m having trouble using this. What I expected was that the event_id of the page ping event would join to the id field of this table and the event_id of the parent pageview event would join to the root_id of this table.

When I executed the following query for the root_id I did indeed get back a page_view event

SELECT
	*
FROM
	events
WHERE
	event_id = '23dd3301-7b7d-469b-b296-90e92c2ea4bc'

But when I did the same query using the id value above I got no events returned

SELECT
	*
FROM
	events
WHERE
	event_id = '38945123-6539-4879-a7a5-123d8049b6ec'

Perhaps I’m misunderstanding how to join page pings and clicks back to the parent page view event using this table. Can you help me understand how to do this?


#8

I think that I answered my own question here, after looking at this thread:

https://groups.google.com/forum/#!topic/snowplow-user/VdMVqBdoXPE

It seems that the id in atomic.com_snowplowanalytics_snowplow_web_page_1 is not intended to join to the events table but is an id for the page view overall. So the page_view event will join to the root_id in this table (as will page_pings and other events) but the ID in the web_page context is not a foreign key that joins back to events.

@christophe & @anton your previous comment makes more sense now. I have a comment but will add to the github issue. Thanks!


#9

Hi @ryanrozich,

You can join atomic.events and atomic.com_snowplowanalytics_snowplow_web_page_1 (and all other atomic tables) on event_id = root_id. For example:

SELECT
  ev.domain_userid,
  ev.domain_sessionidx,
  ev.derived_tstamp,
  ev.event_name,
  wp.id AS page_view_id
FROM atomic.events AS ev
INNER JOIN atomic.com_snowplowanalytics_snowplow_web_page_1 AS wp
  ON ev.event_id = wp.root_id
LIMIT 250

You can think of the unstructured event and context tables in atomic as extensions of atomic.events (with a 1-to-1 join on the event/root ID). So you are correct in your last post. The ID in the web page context is the “page view ID” - all events that happen on the same page view (or load) have the same page view ID (whether it’s the initial page view event that gets sent as soon as the tracker loads, subsequent page ping events, link click events, or any structured or unstructured event).

You might need to join on ON ev.event_id = wp.root_id AND ev.collector_tstamp = wp.root_tstamp if there are some duplicates in both tables. This is an issue that is explained here: Dealing with duplicate event IDs for joining contexts to atomic.events