Configurable Enrichments - Need a bit of help


#1

Hello,

We have been trying to extract a cookie value. Our cookies are URL encoded JSON serialized objects. Ideally we would extract our cookie from raw event headers, then URL decode it to yield a JSON serialized object. We would then convert this object to a derived context and attach it to the original event.

Unfortunately, JS configurable enrichment receives EnrichedEvent, not raw. So we opted into setting up a cookie extractor enrichment and a follow-up JS enrichment to look for derived contexts where cookie extractor was expected to store its results. To our dismay, event.getDerived_contexts() returns null all the time.

We are in fact using a Scala collector and do see cookie extractor output in the enriched data, we just can’t quite get to it from inside the JS enricher’s process(event) function. Can we affect the registry in any way to ensure that JS enricher runs after the Cookie extractor enricher? Are these processes running in parallel or sequentially? Would it make sense to have JS enricher to implement both process_enriched(EnrichedEvent event) and process_raw(SnowplowRawEvent event) or have a lifecycle to be able to attach to?

Any suggestions would be very appreciated!

Sincerely,
David.


#2

Also, while cookie extractor enrichment places its output into derived_contexts other configurable enrichments place their output into contexts. We see and have access to via event.getContexts()

  • iglu:com.google.analytics/cookies/jsonschema/1-0-0
  • iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0
  • iglu:com.optimizely.optimizelyx/summary/jsonschema/1-0-0

but not in neither event.getContexts() nor event.getDerived_contexts()

  • iglu:org.ietf/http_cookie/jsonschema/1-0-0

It is also logical to assume that Cookies extractor operating on raw event would be processed before JS enricher operating on enriched event, isn’t it?


#3

As far as I know it’s not possible (yet) to control the order that enrichments are run.

Previous thread here


#5

Hello @dashirov,

Unfortunately @mike is right - all enrichments run sequentially right now and there’s no way to control their order yet.

Also, while cookie extractor enrichment places its output into derived_contexts other configurable enrichments place their output into contexts.

I’m not sure I understand this point (or there’s some implementation bug we’re missing), but no enrichments place their output into contexts. The contexts property populated solely by trackers/client side (“eager join” to event) and should be not modified in enrichments (“middle join”). All enrichments should append data into derived_contexts. Those schemas you mentioned are exact examples of eager/client-side join. If I miss something - please let us now.

We have plans for major overhaul of enrichments process that will allow you to declare specific order, have more that one instances for each configurable enrichment and much more, but right now I can think of only one way to achieve what you want.

E.g, here’s the order of last, most customizable enrichments:

  1. JS
  2. Cookie extractor
  3. HTTP header extractor
  4. Weather
  5. SQL Query
  6. API Request

As you said, JS enrichment right now has access only EnrichedEvent, but SQL and API enrichements do have access to both types of contexts. My proposal is to build a HTTP micro-service that:

  1. Expects URL-encoded org.ietf/http_cookie/jsonschema/1-0-0 context as GET-parameter
  2. Transforms it in way you want
  3. Returns prepared derived context

Whatever this service return will be added as derived context for later data-modeling. Though this should work I still think it can be quite fragile, e.g. if your server for API won’t be capable enough and will return 4xx/5xx/timeout - whole event will go to enriched/bad. And depending on how many of your events have extracted cookies context (I suspect vast majority) - your service should be very capable/scalable.

It’s up to whether to use this approach or not, but it looks like viable option.