"Viewport" capture and utilization thereof in modeling


Hello All,

I’ve been catching up with Google and Yahoo research related to user engagement measurement and predictive modeling and was eager to try it out myself over the weekend. In this excellent article the authors propose “Using viewport to measure the amount of time users spend on each portion of the webpage” Further, the authors seem to differentiate between “Authored/Initial Content” and “Consumer Contributed/Comments and Responses” content. I also suspect one would want to differentiate ads and large images, if present.

Anyway, my question to the community is has anyone implemented data capture to facilitate experiments alongside those the paper is about? Has anyone seen or used open-source libraries or developed proprietary ones for browser environments that effectively log what portion of the document appeared in the browser viewport and how long has it remained in the viewport? Any experience you’re willing to share?


Hi @dashirov-ga,

Thanks for sharing that paper - it’s quite interesting.

In terms of capturing what portion of the document is in the viewport, and related engagement, you can get this information for web data with standard Snowplow tracking. The enableActivityTracking setting will send page pings at set intervals - you can define how long before the first ping, and how long between each ping.

The page ping will only fire if the user is engaged with the page in that period of time (eg. by moving the mouse or scrolling). If the user isn’t on that tab, the ping won’t fire.

Each ping will set the pp_offset fields to give the maximum amount of scroll since the last page ping.

You also have information on the size of the screen being used, and the size of the document. So you can model how far the viewport has scrolled in the interval, and what part of the document they’re viewing. The scroll-depth step of the web model has an example of modeling scroll, and the time step calculates engaged time.

If you’ve set an appropriate interval between pings, then with a bit of creativity in your query you could also model:

  • How long the user spends on each portion of the page
  • Whether a user is slowly scrolling (as if they’re reading), or scrolling too fast to be engaged with the content
  • Do users re-read portions of the article, or perhaps scroll through it then stop, as if they were looking for a particular piece of information?
  • Time engaged over absolute time (ie % of total time on site as engaged time - I’ve seen some interesting conversion results on this, users are more likely to convert if they spend a higher % of the time between the page view starting and finishing is spent engaged.)

I’m sure there’s plenty more you could think of.

Incidentally, you also get sideways scroll with this, and browser information with the useragent parser. So there’s a lot of different ways you could slice the data.

If you’re looking into something like this, it would be great if you could share how things go for you and what kinds of interesting information you find.