Data collection: the essential but unloved, foundation of the data value chain


#1

Earlier today I published a post taking a long hard look at data collection:

It pulls together a lot of the thinking that underpins the architectural decisions we made building Snowplow. I’d love your feedback! :slight_smile:


#2

Re: 2.2 Data is easy to understand
This is in my experience is one of the hardest ones to achieve a good balance between developer’s and business administration’s point of view. Developers think in abstracts. They think a click is a click no matter what business connotation it may carry. For example, they may come up with a UI Control tracking modeled as follows:
Any interaction with a UI control will be logged with UI control ID, Label, Action. Or think React - State before, action, state after action was performed. They do it, because that’s how their brains think. It becomes incredibly hard to then interpret the event logs - yes, all the data is present an the business can be described and analyzed with the data collected, but it does require a lot of mental work to reconstruct the business meaning hidden in these event logs. On the other hand, when business administration starts dictating what and how things are to be modeled, they place a lot of temporary and overloaded concepts into the models creating repetitive, unnecessarily complex and eventually technical debt bloated models that require an extremely verbose change log document to be maintained to explain the rationale and applicability of the model attributes to the current state of affairs. I have been around through development of hundreds upon hundreds of browser, desktop and mobile applications - most of them successful with wide user base adoption. I’ve noticed this pattern repeat over and over again. It takes a few lifers who’ve been around for a decade or two to help both sides find the right balance.

How does your experience help you get it right? Can you share your methods?


#3

Hi @dashirov - many thanks for sharing your experiences. They certainly chime with what we’ve seen.

It sounds corny, but the two things that really help here are:

  1. Developers and analysts working closely to negotiate the specs. My preference is that the analysts (i.e. data consumers) actually own the specs, but that may be because I myself am an analyst.
  2. Making sure that the specification is general enough. It needs to have business meaning (that should be baked into the data at data collection time), but hopefully this can be done in a generalizable way that means the onus on the developers, especially when rolling out tracking in new environments, to get that tracking right isn’t too hard. To take your example: there’s a balance to be struck between treating every click the same (the dev approach) and treating every element in the user experience as unique. Hopefully there are a handful of categories that all “clicks” can be classified as, that give data consumers the meaning they require and don’t drive the devs too crazy instrumenting tracking?