Schema syntax for custom unstructured events

I want to track custom unstructured events, similar to this code sample:

window.snowplow_name_here('trackUnstructEvent', {
    schema: 'iglu:com.acme_company/viewed_product/jsonschema/2-0-0',
    data: {
        productId: 'ASO01043',
        category: 'Dresses',
        brand: 'ACME',
        returning: true,
        price: 49.95,
        sizes: ['xs', 's', 'l', 'xl', 'xxl'],
        availableSince: new Date(2013,3,7)
    }
});

Source: https://github.com/snowplow/snowplow/wiki/2-Specific-event-tracking-with-the-Javascript-tracker#custom-structured-events

The sample schema URL again: iglu:com.acme_company/viewed_product/jsonschema/2-0-0

My schema will be hosted at this static location: ddajv0fjp16r7.cloudfront.net

So should my schema reference look like this iglu:net.cloudfront.ddajv0fjp16r7/viewed_product/jsonschema/2-0-0?

That is – “iglu:” (the protocol), followed by my domain reversed – “net.cloudfront.ddajv0fjp16r7”.

Thanks,
Karl

Hi @karl_jones,

Generally, schema reference looks like this:

iglu:vendor_name/event_name/jsonschema/2-0-0
---- ----------- ---------- ---------- -----
  |        |          |          |       |- schema version (model-revision-addition)
  |        |          |          |- schema format
  |        |          |- event name
  |        |- vendor of the event
  |- schema methodology

On the other hand, the JSON schema describing the event will look like:

{
    "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
    "description": "Schema for event_name",
    "self": {
        "vendor": "vendor_name",
        "name": "event_name",
        "format": "jsonschema",
        "version": "2-0-0"
    },
    "type": "object",
    "properties": {
        "productId": {
            "type": "string"
        },
        "category": {
            "type": "string"
        },
        ...
    },
    "minProperties":<<min number>>,
    "required": [<<list of required properties>>],
    "additionalProperties": <<false/true>>
}

The important part here is the self section:

"self": {
        "vendor": "vendor_name",
        "name": "event_name",
        "format": "jsonschema",
        "version": "2-0-0"
}

Again, note the properties’ values vendor_name/event_name/jsonschema/2-0-0.

Thus, your JSON schema file is expected to be located at

ddajv0fjp16r7.cloudfront.net/schemas/vendor_name/event_name/jsonschema/2-0-0

Note the folder schemas containing the same structure vendor_name/event_name/jsonschema/2-0-0. You might want to host different schemas associated with different vendors.

This also means that your resolver configuration file should look something like

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      },
      {
        "name": "Your Iglu Server",
        "priority": 0,
        "vendorPrefixes": [ "vendor_name" ],
        "connection": {
          "http": {
            "uri": "http://ddajv0fjp16r7.cloudfront.net"
          }
        }
      }
    ]
  }
}

Also, ensure you set the appropriate permissions for your JSON schema file to be accessible from the EMR cluster during the enrichment process.

Hopefully, this helps.

–Ihor

3 Likes

Hello Ihor,

Thanks, this is helpful. But I still don’t understand how to use the resolver configuration file.

It looks to me like the resolver file “iglu_resolver.json” pertains to to the Enrichment process. I am currently setting up Tracker, not Enrichment.

I don’t understand how to use the resolver – or if I need to use the resolver – during the Tracking and Collection phase (for Unstructured Event tracking). The resolver file is not part of the sp.js build process, so how does my Tracker know how to resolve?

Is it the case that the Tracker JavaScript code specifies the URL of the schema, without reference to the resolver file? Again, something like this code sample from Snowplow –

window.snowplow_name_here('trackUnstructEvent', {
schema: 'iglu:com.acme_company/viewed_product/jsonschema/2-0-0',
data: {
productId: 'ASO01043',
category: 'Dresses',
brand: 'ACME',
returning: true,
price: 49.95,
sizes: ['xs', 's', 'l', 'xl', 'xxl'],
availableSince: new Date(2013,3,7)
}
});

– but with “schema” set to the URL for my Schema file? Here is the file URL you suggest: ddajv0fjp16r7.cloudfront.net/schemas/vendor_name/event_name/jsonschema/2-0-0

Should my JavaScript Tracker, therefore, have a parameter like this:

{
  "schema": "iglu:net.cloudfront.ddajv0fjp16r7/schemas/vendor_name/event_name/jsonschema/2-0-0",
  "data": {...}
}

Given this syntax for the “schema” parameter for the JavaScript tracker –

“schema”: “iglu:com.acme_company …” (etc)

Does “com.acme_company” automatically get converted to “acme_company.com”? By automatically, I mean it’s used as a parameter in my JavaScript tracker, and the Tracker interprets “com.something” as “something.com” - ?

This appears to be the case, given that the JavaScript Tracker appears to not use the Resolver file?

Thanks,
Karl

Hello @karl_jones,

You’re exactly right. Resolver configuration pertains to enrichment process. But no, you don’t need to have resolver configuration during tracking. Tracking and enrichment are two decoupled processes. Your schema can be missing at all during “tracking” and “collecting”, events will still be handled by collector.

But as soon as you started enrichment - you will need resolver configuration. And enrichment (Scala Hadoop Enrich or Stream Enrich) will try to fetch schemas that you attached to sent events, using this resolver configuration. If schemas for some events don’t exist - those events go to enriched/bad S3 bucket. If schema is found, but event cannot be validated against it - into enriched/bad again.

TL;DR

  1. Tracker doesn’t know where your schemas placed or do they exist at all. All you need to provide is Iglu URI (or schema reference as @Ihor described it).
  2. Collector doesn’t know either.
  3. “Enricher” need to know what your schemas looks like and it will fetch required schema from (static) registry.
  4. You may have several registries with different schemas. Resolver configuration helps “enricher” to fetch correct schemas.

Hope that will be useful.

1 Like

Anton

Thanks, that is helpful, I have a better understanding now.

My current blocker – how to pass Unstructured events from Tracker to log file at AWS?

Using the code below, I get a log file entry which includes the AppID “Test_KGJ_Aug_24_1036am”.

But none of the Unstructured event values appear in the log file.

<!-- Snowplow starts plowing -->
   <script type="text/javascript">

       ;(function(p,l,o,w,i,n,g){if(!p[i]){p.GlobalSnowplowNamespace=p.GlobalSnowplowNamespace||[];
           p.GlobalSnowplowNamespace.push(i);p[i]=function(){(p[i].q=p[i].q||[]).push(arguments)
           };p[i].q=p[i].q||[];n=l.createElement(o);g=l.getElementsByTagName(o)[0];n.async=1;
           n.src=w;g.parentNode.insertBefore(n,g)}}(window,document,"script","{{  theme_asset( '/static/js/snowplow/sp.js' ) }}","snowplow"));

       window.snowplow('newTracker', 'co', 'd1epsz32winqbo.cloudfront.net', { // Initialise a tracker
           appId: 'Test_KGJ_Aug_24_1036am', // Application ID. Make sure you use the same value across all the tags you fire on your trial
           platform: 'web'
       });
       window.snowplow('enableActivityTracking', 30, 30); // Ping every 30 seconds after 30 seconds
       window.snowplow('enableLinkClickTracking');
	   
	   window.snowplow('trackPageView');
			   
		window.snowplow('trackUnstructEvent', {
			schema: 'iglu:com.acme_company/viewed_product/jsonschema/2-0-0',
			data: {
				productId: 'ASO01043',
				category: 'Dresses',
				brand: 'ACME',
				returning: true,
				price: 49.95,
				sizes: ['xs', 's', 'l', 'xl', 'xxl'],
				availableSince: new Date(2013,3,7)
			}
		});

   </script>
<!-- Snowplow stops plowing -->

I also tried omitting the “schema: ‘iglu:com.acme_company/viewed_product/jsonschema/2-0-0’” parameter (because this is not used, if I understand correctly). No difference – the log file gets written, but no Unstructured parameters.

Thanks again for your help.

Karl

Hey @karl_jones

What exactly your logfile entry looks like? I think you don’t see parameters because they’re base64 encoded. It should be in ue_px URL parameter, otherwise your snippet looks correct.

And again: do not remove schema parameter with Iglu URI from trackUnstructEvent call. If enricher encounter JSON object without schema it simply invalidate it. And I think it also impossible to send this invalid (without schema) JSON object. To clarify what I said in previous post: schema can be unexisting at the moment of tracking, but schema key must be included, because otherwise this is not self-describing (unstructured) event.

Cheers

What is the ue_px parameter? How do I use it.

The page you linked to (“base64 encoded”) does not refer to ue_px.

Do I want “base64 encoded” to be true or false?

Thanks,
Karl

Hello Anton,

Here is the log file content from my most recent test:

#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version
2016-08-24	17:25:03	IND6	480	38.92.143.129	GET	d1epsz32winqbo.cloudfront.net	/i	200	http://www.startribune.dev/	Mozilla/5.0%2520(Windows%2520NT%25206.1;%2520Win64;%2520x64)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Chrome/52.0.2743.116%2520Safari/537.36	stm=1472059504441&e=pv&url=http%253A%252F%252Fwww.startribune.dev%252F&page=StarTribune.com%253A%2520News%252C%2520weather%252C%2520sports%2520from%2520Minneapolis%252C%2520St.%2520Paul%2520and%2520Minnesota&refr=http%253A%252F%252Fwww.startribune.dev%252F&tv=js-2.6.2&tna=co&aid=Test_KGJ_Aug_24_1220pm&p=web&tz=America%252FChicago&lang=en-US&cs=UTF-8&f_pdf=1&f_qt=0&f_realp=0&f_wma=0&f_dir=0&f_fla=1&f_java=0&f_gears=0&f_ag=0&res=1366x768&cd=24&cookie=1&eid=79cc0b4e-c9c7-40a5-9169-60ac7738932f&dtm=1472059504434&vp=1156x570&ds=1139x6205&vid=12&sid=82fde36f-0f96-4b91-91c7-fc2b8c600051&duid=473db0c959afc098&fp=3713795072	-	Hit	fC2SFFWC-naLRfJQqYlFM0zEDXQ8FRCNmuI74qMT7OYZQ9rfhsu10g==	d1epsz32winqbo.cloudfront.net	http	988	0.001	-	-	-	Hit	HTTP/1.1
2016-08-24	17:25:03	IND6	480	38.92.143.129	GET	d1epsz32winqbo.cloudfront.net	/i	200	http://www.startribune.dev/	Mozilla/5.0%2520(Windows%2520NT%25206.1;%2520Win64;%2520x64)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Chrome/52.0.2743.116%2520Safari/537.36	stm=1472059504761&e=ue&ue_px=eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy91bnN0cnVjdF9ldmVudC9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJzY2hlbWEiOiJpZ2x1Om5ldC5jbG91ZGZyb250LmRkYWp2MGZqcDE2cjcvdHJhdmlzL2pzb25zY2hlbWEvMS0wLTAiLCJkYXRhIjp7InByb2R1Y3RJZCI6IkFTTzAxMDQzIiwiY2F0ZWdvcnkiOiJEcmVzc2VzIiwiYnJhbmQiOiJBQ01FIiwicmV0dXJuaW5nIjp0cnVlLCJwcmljZSI6NDkuOTUsInNpemVzIjpbInhzIiwicyIsImwiLCJ4bCIsInh4bCJdLCJhdmFpbGFibGVTaW5jZSI6IjIwMTMtMDQtMDdUMDU6MDA6MDAuMDAwWiJ9fX0&tv=js-2.6.2&tna=co&aid=Test_KGJ_Aug_24_1220pm&p=web&tz=America%252FChicago&lang=en-US&cs=UTF-8&f_pdf=1&f_qt=0&f_realp=0&f_wma=0&f_dir=0&f_fla=1&f_java=0&f_gears=0&f_ag=0&res=1366x768&cd=24&cookie=1&eid=6e318a3b-e3be-4f8d-b909-174d8e3bbe02&dtm=1472059504444&vp=1156x570&ds=1139x6205&vid=12&sid=82fde36f-0f96-4b91-91c7-fc2b8c600051&duid=473db0c959afc098&fp=3713795072&refr=http%253A%252F%252Fwww.startribune.dev%252F&url=http%253A%252F%252Fwww.startribune.dev%252F	-	Hit	vcKBTN78D2WWDgdJeyYiHLbm8kSMnKos-qzP9-z7h7eSBesL3GUabA==	d1epsz32winqbo.cloudfront.net	http	1331	0.008	-	-	-	Hit	HTTP/1.1
2016-08-24	17:25:34	IND6	480	38.92.143.129	GET	d1epsz32winqbo.cloudfront.net	/i	200	http://www.startribune.dev/	Mozilla/5.0%2520(Windows%2520NT%25206.1;%2520Win64;%2520x64)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Chrome/52.0.2743.116%2520Safari/537.36	stm=1472059535105&e=pp&url=http%253A%252F%252Fwww.startribune.dev%252F&page=StarTribune.com%253A%2520News%252C%2520weather%252C%2520sports%2520from%2520Minneapolis%252C%2520St.%2520Paul%2520and%2520Minnesota&refr=http%253A%252F%252Fwww.startribune.dev%252F&pp_mix=0&pp_max=0&pp_miy=400&pp_may=400&tv=js-2.6.2&tna=co&aid=Test_KGJ_Aug_24_1220pm&p=web&tz=America%252FChicago&lang=en-US&cs=UTF-8&f_pdf=1&f_qt=0&f_realp=0&f_wma=0&f_dir=0&f_fla=1&f_java=0&f_gears=0&f_ag=0&res=1366x768&cd=24&cookie=1&eid=6c3cdef0-ff3b-4705-8959-788e1106006a&dtm=1472059535094&vp=1156x570&ds=1139x6564&vid=12&sid=82fde36f-0f96-4b91-91c7-fc2b8c600051&duid=473db0c959afc098&fp=3713795072	-	Hit	A5b923zwlI25LJMPRA-4KIyMWBcFrP11s6PiEhLAA13Xsq53C-f40A==	d1epsz32winqbo.cloudfront.net	http	985	0.002	-	-	-	Hit	HTTP/1.1

Here is the Tracker JS (note that I changed the Schema URL to my static Iglu repo:


   

The above does not reference Base 64 – please advise how to do this.

Thanks again,
Karl

Karl,

On the second line, have a look at this:

ue_px=eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy91bnN0cnVjdF9ldmVudC9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJzY2hlbWEiOiJpZ2x1Om5ldC5jbG91ZGZyb250LmRkYWp2MGZqcDE2cjcvdHJhdmlzL2pzb25zY2hlbWEvMS0wLTAiLCJkYXRhIjp7InByb2R1Y3RJZCI6IkFTTzAxMDQzIiwiY2F0ZWdvcnkiOiJEcmVzc2VzIiwiYnJhbmQiOiJBQ01FIiwicmV0dXJuaW5nIjp0cnVlLCJwcmljZSI6NDkuOTUsInNpemVzIjpbInhzIiwicyIsImwiLCJ4bCIsInh4bCJdLCJhdmFpbGFibGVTaW5jZSI6IjIwMTMtMDQtMDdUMDU6MDA6MDAuMDAwWiJ9fX0

This is your base64-encoded unstructured event. You can decode it using any base64 decoder, like this: https://www.base64decode.org/

You of course don’t need to do it manually, enrichment will extract it for you.

Tracker’s HTTP request lands to a collector which is essentially a webserver, which writes HTTP requests to log. For example, a Cloudfront log looks like following:

`2016-01-20 20:22:55 IND6 480 174.2.224.27 GET d2gtrjee5bqfpl.cloudfront.net /i 200 https://www.properweb.ca/hosting/ Mozilla/5.0%2520(Windows%2520NT%25206.1)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Chrome/47.0.2526.111%2520Safari/537.36 e=ue&ue_px=eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy91bnN0cnVjdF9ldmVudC9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy9saW5rX2NsaWNrL2pzb25zY2hlbWEvMS0wLTEiLCJkYXRhIjp7InRhcmdldFVybCI6Imh0dHBzOi8vd3d3LnByb3BlcndlYi5jYS9ob3N0aW5nL2NvbXBhcmUtcGVyc29uYWwtcGxhbnMvIiwiZWxlbWVudElkIjoiIiwiZWxlbWVudFRhcmdldCI6IiJ9fX0&tv=js-2.5.3&tna=cf&aid=cfpweb&p=web&tz=America%252FGuatemala&lang=en-US&cs=UTF-8&f_pdf=1&f_qt=0&f_realp=0&f_wma=0&f_dir=0&f_fla=1&f_java=0&f_gears=0&f_ag=0&res=1152x864&cd=24&cookie=1&eid=a8451163-d056-4a6c-a8ef-c612aab3c252&dtm=1453321369503&vp=1152x329&ds=1135x2601&vid=3&sid=a2e39d3f-af4d-48f7-b153-8ca79942a552&duid=830e4863d85df04a&fp=1354193749&refr=https%253A%252F%252Fwww.properweb.ca%252Fdomain-name-registration%252F&url=https%253A%252F%252Fwww.properweb.ca%252Fhosting%252F - Hit yavbRZy0qwso0j-8VBYB-VHIaJjo8K4eaARnXiseXDvKSH8vZ-_Mlg== d2gtrjee5bqfpl.cloudfront.net https 1268 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit` 

Note URL parameters above p=, tv=, aid= etc. These are all predefined tracker parameters, you can find more about them in Tracker protocol page.

I’d recommend to leave unstructured events base64-encoded as it’s url-safe.

1 Like

Bingo! Success! Yes yes yes!

Base64 was the missing key to my sanity. Now it all makes sense!

Thanks again!

Karl

1 Like

I’m glad it was helpful!

But one more thing, Karl.

In this snippet:

	window.snowplow('trackUnstructEvent', {
		schema: 'iglu:net.cloudfront.ddajv0fjp16r7/travis/jsonschema/1-0-0',
		data: {
			productId: 'ASO01043',
			category: 'Dresses',
			brand: 'ACME',
			returning: true,
			price: 49.95,
			sizes: ['xs', 's', 'l', 'xl', 'xxl'],
			availableSince: new Date(2013,3,7)
		}
	});

You have probably wrong value in schema. @Ihor gave a good explanation what it should look like.
This can be a valid, but I really doubt your vendor is net.cloudfront.ddajv0fjp16r7 and your event name is travis. Usually vendor is reversed company name, but this is just convention.

  1. Don’t confuse Schema HTTP URI, which looks like following: http://ddajv0fjp16r7.cloudfront.net/travis/jsonschema/1-0-0 and Iglu URI, which looks like following iglu:com.acme.karl/someevent/jsonschema/1-0-0.
  2. On tracking stage you need to specify Iglu URI
  3. Enrichment will construct Schema HTTP URI using Iglu URI + Resolver configuration.
  4. Tracker doesn’t know where your schema is stored.

Hello Anton,

Thanks for the followup observations.

(A) Event name “travis” is temporary, for testing – I followed a code example provided by Travis Devitt (who replied to one of my Discourse questions), located here:

https://github.com/tdevitt/snowplow_examples

So, yes, “travis” is not an appropriate event name – I will delete this now that I know how Snowplow actually works.

(B) I see your point (I think) about not confusing Schema HTTP URI with Iglu URI.

Let me restate the issue, to confirm my understanding. Here is the next version of JavaScript tracker code (note that I am working for the Star Tribune, so I use their name in my schema string). Schema is not a working URL – rather, it is an identifier which gets written into the log file, which I will reference during the Enrichment phase:

		window.snowplow('trackUnstructEvent', {
			schema: 'iglu:com.startribune/someevent/jsonschema/1-0-0',
			data: {
				productId: 'ASO01043',
				category: 'Dresses',
				brand: 'ACME',
				returning: true,
				price: 49.95,
				sizes: ['xs', 's', 'l', 'xl', 'xxl'],
				availableSince: new Date(2013,3,7)
			}
		});

Look good?

Thanks again for your help.

Karl

Absolutely!

More precisely, on enrichment step you’ll need to add your static registry (which I believe is ddajv0fjp16r7.cloudfront.net?) to resolver configuration. And that’s it.