Incremental enrichment

Hey Guys,

Need to be cost conscious on AWS/Snowflake, is there a methodology to only send incremental data?

I understand the enableActivityTrackingCallback helps with page pings but its more than/of this is what I’m looking for.

For all events whether structured, unstructured, page_view or page_ping for a domain_user. All of these rows has a user agent string/IP which is a repeat for the same session which in turn the BR/DV/OS/IP/GEO fields are then populated. I only really require this type of information on the initial page view or the unload/end of the session. It’s not an issue if the associated events other than the first page_view are not populated with aforementioned. Reducing the volume of data within the streams and within Snowflake would drastically reduce the cost associated.

While it is possible to create JS functions to limit certain contexts being required with some page_views I was hoping there is a better way?

Thanks!

This is something we’re considering. We’ll be making the br fields optional in the next release of the JS Tracker and I imagine subsequent releases will have similar functionality. The idea of making these fields optional by event type is interesting and something I’m happy to consider further.

However, some of these fields are automatically added by the collector though, IP, Geo, UserAgent are all sent automatically in HTTP Headers and parsed by the collector and enrichment steps for every event. This would be a big change to do it per event type in the collector and enrich, that ultimately would reduce the richness of the data and increase the complexity of modelling the data, so I don’t see being able to restrict these fields per event any time soon.

1 Like

Hi @PaulBoocock,

Thanks for the response. I can appreciate increased complexity and reduced richness is 100% not a road to go down, I certainly wouldn’t want this. On the optional piece, could there be a middled ground to Y/N? even if only page_views contained br/UA/IP/Geo and structured, unstructured and pings didn’t this would be a significant reduction in making the streams and EDW leaner while maintaining the richness.

Ultimately I’m going to be heading towards billions of records annually. If I can keep the data as rich as possible but also lean by setting a flag to say something like each page_views gets the whole header record but other events do not this should the keep the richness and make it lean. I have domain_userid so maintaining the richness of “static” session entities like BR/IP…etc wouldn’t be an issue in this example. Even if the enrichments could identify the record type and not do BR/IP for anything but page_view this would be a nice middle ground. Easier said than done…

With all that being said, I’m loving Snowplow, I’ve never came across digital analytics solution that I can build exactly to the specifications I require as much than with Snowplow.

Thanks again.

1 Like