Standardise Session Contexts

Hi,

We have implemented trackers on iOS, Android and Web.

On Web the session ID and counts etc. are defined in the tracker protocol e.g. sid, vid

On Android when enabled a context is generated instead
{"schema":"iglu:com.snowplowanalytics.snowplow\/client_session\/jsonschema\/1-0-1","data":{"sessionIndex":141,"storageMechanism":"SQLITE","firstEventId":"61be42b5-4c6a-4e8f-a29e-80e440cb8239","sessionId":"046da8d4-3863-405a-8596-619142c1f4a0","previousSessionId":"abbddab3-23fb-498d-9213-7ce18bb37273","userId":"726594da-d570-4b2b-aff8-7f59ce0e1668"}}

What would be ideal is if the session sid and vid were set and the above “client_session” context could also be added. “client_session” is quite large and adds quite significantly to the size of the payload.

Do you have an enrichment that would be an alternative for the time being?
e.g. get values from client_session, set sid/vid, remove context

It’s really important that our event streams look the same across the platforms.

Thanks in advance,
Rob

1 Like

Nice work on getting Snowplow implemented there, Rob!

I’m not familiar with the iOS and Android trackers, but could you generate session IDs and indexes across platforms further downstream? E.g. during data modelling. Then you wouldn’t need to increase your event payloads with the client_session context.

Also, I believe web’s session IDs and counts are based on browser-standard session cookies. So calculating sessions downstream might be more configurable/consistent across web/Android/iOS.

1 Like

Hi @Rob_Ellison,

The behaviour you describe is by design.

A session id on the web and a session id on mobile represent different, distinct things, and the trackers are designed to reflect that. The javascript tracker uses a browser cookie for sessionisation - this concept doesn’t exist on mobile, additionally, the logic of how session ids are incremented are different across the two platforms.

It’s really important that our event streams look the same across the platforms.

I would consider this a requirement of your business logic. The trackers are built to send an unopinonated event stream to the pipeline, which makes no assumptions about the business logic. The data needs to be as exact a representation as possible of what’s being measured for this model to work - this is what atomic data is.

As Rob suggests above, your business logic and assumptions can be built into a data model which runs over atomic data to produce a derived output. If there is ever a question, doubt or change of opinion about the model logic, then you have a log of the atomic data to test those assumptions and recompute the logic if required.

The specific requirement in question here could be done with some simple SQL, for example, using a coalesce statement.

If it is very important to you that the data you send incorporates your assumptions then, you can generate session ids yourself and send them as custom contexts. You can potentially also utilise server-side tracking to achieve something similar.

What would be ideal is if the session sid and vid were set and the above “client_session” context could also be added. “client_session” is quite large and adds quite significantly to the size of the payload.

So, the string you’ve sent over should be less than one kb added to the payload. To date, across all of our users on mobile, that’s not been raised as a problematic amount of data to attach to events. However, we’re aware that there are certain verticals that have specific requirements around minimal payloads, and it’s a topic that’s worth exploring.

Perhaps you can provide a bit more information on your use case and why the payload size is a concern for you? It may well be that we have come across something similar before and can recommend an approach.

Best,

4 Likes

Hi,

Thanks for your all your swift responses. I’d be interested to know more on what the differences you mention. Between the cookie solution and mobile solutions. Also what the differences between the session counts.

Isn’t this a concept that all of the trackers support. The goal is to have:

  • one value that represents a session
  • one that represents the count of sessions

How this is achieved and the platforms specific limitations and approaches shouldn’t have a bearing on this should it?

A session should have a fairly standardised definition e.g. events from users haven’t been seen for x number of ms generate a new session and increment session count.

I agree anything that this can be done in the post processing but having non-standard protocol per tracker makes things a generally a lot more complex. It also feel like this is fixing a problem afterwards rather than at the source.

It’s mainly that we are having to pay the collector, enricher and kinesis stream costs, also storage costs. The less data the better.

Thanks,
Rob