API Request Enrichment with AWS

eileen_dover · August 6, 2022, 12:22am

Hello! I’m new to working with both Snowplow and APIs, and I just had a few questions regarding the set-up for API request enrichments.

My goal is to enrich each event being sent from an app with the app metadata. I currently have a REST API on AWS API Gateway that takes GET requests and returns a JSON object literal that looks like the following:

{"Item":{"json":"{ \"endpoint\": \"external\", \"team\": \"my_team\" }","app_name":"my_app"}}

This data exists in a DynamoDB table where the key is the app_name. My goal is that, when any event (custom structured events, page views, page pings, etc.) is being fired from a specific app (e.g. my_app), I could call the API with the app name as the key, and this information could be appended onto that event. I’ve looked through the tutorial here and have read through some forum posts on this site for reference, but I am having trouble understanding how to set up the configurations:

In editing the api_request_enrichment_config.json file in Phase 3, I had the following questions:

When would I use pojo vs json? Is this different depending on the event I’d want to be enriching (e.g. page view vs custom structured events)?
If I do use json, what is meant to go in field and schemaCriterion? Particularly for events like page views that don’t seem to follow a specific schema?
For outputs, does the outputted information get automatically appended to the Snowplow event being tracked? It seemed like in the tutorial that the returned data from the API was stored in a separate table in clearbit, but I was wondering if there was any way to attach that data to the payload of a page view/custom structured event? Is this something configured in the api_request_enrichment_config.json file? Or would it be an additional parameter when I call trackStructEvent?

Thanks so much in advance! I’m a little new to all of this, so I’m happy to clarify anything that didn’t make sense.

dilyan · August 8, 2022, 1:24pm

Hi @eileen_dover,

First, I’ll try to give you some context that will hopefully make it easier to understand my answers to your questions.

We can think of Snowplow events as being one of two types. We usually call them ‘atomic’ and ‘custom self-describing’ events (though you might see the latter also being called ‘custom unstructured events’). The main difference between these two types is that the self-describing events are always accompanied by information about the schema against which they must validate. Whereas the ‘atomic’ events do not come with a JSON schema to which they must conform.

If you think of a Snowplow event as a single line, it will have properties like app_name, event_name etc and they are associated with specific values. If it’s an ‘atomic’ event, it might only have these simple key-value pairs. But, if it’s a self-describing event, it will have an additional field, called unstruct_event. The value of this field will be a JSON with two important properties: schema which points to the JSON schema that the event must validate against; and data, which contains the actual event payload.

In addition, both types of events – atomic and self-describing – might have some custom contexts added to them by the tracker. Just like the unstruct_event, the information about these contexts will be in a separate field of the event for ‘custom context’, and the value will be a self-describing JSON (specifying both the schema for the context, and the data that must validate against this schema).

It’s important to understand that a self-describing event always has all the ‘atomic’ properties as well. It can be represented very approximately as something like:


Map(
  "app_name" -> "my_app",
  "event_name" -> "my_event_name",
  "unstruct_event" -> 
    {
      "schema": "iglu:com.acme/my_event_name/jsonschema/1-0-0", 
      "data": {"key1": "value1"}
    }
)

To answer your first question, you would use pojo when your input is part of the ‘atomic’ fields, like app_id. And you will use json when it is part of one of the JSON fields, like unstruct_event or contexts.

On your second question, the field must specify where to look for the JSON. It must be one of unstruct_event , contexts (sent by the tracker) or derived_contexts (added by another enrichment).

The schemaCriterion must specify the schema for the event or context where the input is coming from.

For example, you might have the following inputs (matching the example above):

{
  "key": "app_name",
  "pojo": {
    "field": "app_name"
  }
},
{
  "key": "key1",
  "json": {
    "field": "unstruct_event",
    "schemaCriterion": "iglu:com.acme/my_event_name/jsonschema/1-0-0",
    "jsonPath": "$.key1"
  }
}

Finally, all outputs will be attached to the event, and they will go in the derived_contexts field. Like unstruct_event and contexts, this field contains self-describing JSONs, so you will need to write one or more schema(s) for the output (depending on whether you want to attached all the data returned from the API as a single context or as multiple contexts).

You might also find this tutorial helpful, especially steps 2 and 3.

Topic		Replies	Views
Need enriched raw data in JSON Enrichment	7	3127	May 26, 2017
How to access my custom data in POJO object in Javascript enrichment Enrichment	4	1375	March 24, 2021
API enrichment response/error handling question Enrichment	3	1325	October 31, 2017
How to get structured event data without a separate enrich step? For engineers	2	611	September 16, 2020
Pubsub Enricher failed to add enriched values into the events (Javascript enrichment) Enrichment	6	926	November 18, 2021

API Request Enrichment with AWS

Related Topics