Should this be built as an enricher?

Hi guys, we are new to Snowplow, but very excited to start using the platform. We have an use case that we’d like to validate for using with Snowplow.

Imagine we have a front-end app sending events to the server every 3 seconds while certain content is being displayed on the screen - like a pulse/heartbeat. Depending on the size of the content (which we know in the back-end), we need to wait for X heartbeats until we consider enough time has passed for the content to be properly “viewed”.

So, basically, if the video has 60 seconds, we’d need to see about 20 heartbeats to consider it “watched”.

We used to do this in Dataflow, but now we are considering moving the whole thing to Snowplow.

What we thought was to create an “enrichment” API that Snowplow would call for each beat and the API would have its logic to decide whether the content has been consumed or not. Does this architecture make sense? Still seems like we are missing something.

Luis

@lfnovo, if the backend “knows” if the content has been fully viewed you do not need to track every heartbeat. Just track the fact of the completed view server-side. That is instead of tracking the client-side track server-side unless you have no control of the backend app.

No no no… the backend knows that 20 beats = “content viewed”. But it does not have any knowledge of the beats. The beats are known to Snowplow only.

@lfnovo, it doesn’t change the server-side tracking solution. Do you need to track heartbeats as well? Your client-side app sends heartbeats to the server - you do not need Snowplow for that. Once the server-side app detects the expected number of the heartbeats it would send Snowplow event representing the “viewed” event.

The thing is that we wanted it to be log based because we will add support for offline viewing soon. So the client would retain the beats until connecting with the web and dispatching them at once. Makes sense?

Hi @lfnovo, welcome to the Snowplow community!

Full disclosure, I’m coming at this from a high level, but I think you have a couple of different options.

I get the use case, I think. “Watched” doesn’t mean the video has been served, it means the user stuck around long enough to digest it. My understanding is that you could:

  1. Solve it downstream through modelling by rolling up n heartbeats in to a “watched” state.
  2. Instrument the logic outside of the tracker and send a different “watched” event after you’ve clocked n heartbeats.

The advantage of 1) is that you can decide later how many heartbeats constitutes “watched”. The advantage of 2) is that it’s simple to put in place. A great use case to explore the power of Snowplow though, so why not try both?

The main thing stopping you from doing this in enrich is that it’s an atomic process - each event is processed independently. So the enricher doesn’t have any state regarding to the events that have passed through. You won’t have that record at enrichment to say that n heartbeats have been processed.

We’ve got some really cool page ping aggregation functionality coming soon, but I don’t think it quite matches your use case because I’m assuming your heartbeat starts firing once the video has been asked to play.

Hope this helps. Have fun!

3 Likes

As Steve has mentioned above the enrichment processes is stateless (that is it has no awareness of events that arrived before it) so your use case would be difficult to implement as an enrichment at the moment as it would require some kind of windowing functionality.

There’s a few options you could look at including:

  • Performing this in the downstream data modelling process
  • Performing this in a specific Dataflow job that reads off the enriched PubSub topic
  • Performing this in the client directly - e.g., sending video pings but also accumulating time in the same event such that total_time_watched += ping_duration.
3 Likes

Thank you guys, very useful answers. Yes, processing downstream and processing through Dataflow could be possible alternatives. Just to nail the enricher idea down, what I was thinking is that I could use the IDs of the transaction (which are present in the logs) and keep the state on the Enrich API itself. Basically, have an API connected to a Key-value memory database that counts the beats and then pushes the completion evento to our Event bus. Would that make sense, Snowplowy wise?

If you want to test that road, take a look at the API enrichment. I do think solving upstream in your tracking or downstream in modelling would be the more “Snowplowy” and I expect more speed and cost efficient.

So my two cents:

what I was thinking is that I could use the IDs of the transaction (which are present in the logs) and keep the state on the Enrich API itself. Basically, have an API connected to a Key-value memory database that counts the beats and then pushes the completion evento to our Event bus. Would that make sense, Snowplowy wise?

It makes sense in that it would probably work, yes. However it sounds like more work and cost than I’d really want to take on for this kind of task. If I were designing this solution, I’d want to avoid maintaining additional resources and making all those API calls if I could.

If the requirement is for general reporting purposes, the most Snowplow-y way is to report heartbeat events as individual occurrences, agnostic to previous state, and have all of the logic of what to do with it live in the data model. That brings the advantage of flexibility in future - if you change your logic or realise that looking at differences between definitions of ‘viewed’ is advantageous, it’s just a matter of aggregating over the dataset again.

If you need the information in real-time, a potential solution is to have the client report heartbeat number with each heartbeat event. The server/client could also report how many heartbeats are required for each video to be considered ‘viewed’ (in, say, a video/content custom context). This way anything downstream that needs to know what the ‘viewed status’ of each heartbeat is can just compare these values without needing to know about any other events.

Best,

Thank you all for the comments. We decided to not use the enrichment approach and will handle this on the client side as suggested. Thank you very much.

Let us know how you get on!