Adding one field to all track commands

Hey all,

We’ve been using snowplow for a few weeks and everything has been going great. However, we are having a bit of difficulty adding a custom field (org_id) that’d we’d like to include on all tracking messages. Is there an easy or preferred way to do this?

We are looking at self-describing events which we are very excited about, however from the wiki it seems like the schema registry is only for paid/trial customers. For an open-source setup, is this possible?

Thanks,
Patrick

I should note, we are primarily using Google Tag Manager for tagging for our tracking messages. Also, we are not currently writing into anything but Elasticsearch and a custom data destination (using lambda function)

Hey @pcb,

Iglu is open-source. Here’s the setup guide.

Paid customers get access to the Snowplow Console, which offers a simple & intuitive UI to manage schema creation, linting and uploading for both testing and production. As an open source user it’s not such a nice user experience but can use the Igluctl client for that workflow.

Best,

Thanks Colm.

I’ve read through that and I’m still a bit confused. I’m currently using Scala Stream with Kinesis. Questions:

  1. Does Scala Stream Enricher already have an iglu client? I know I specify a resolver for it to use so I figured it was using it somehow.
  2. Can I use S3 to host my schema? Either by specifying the file URL or by making the bucket public website?

Thanks for help!
Patrick

Does Scala Stream Enricher already have an iglu client? I know I specify a resolver for it to use so I figured it was using it somehow.

So if you mean an instance of Iglu, there’s a public Schema Repo hosted by Snowplow called ‘Iglu Central’ which holds the schemas for standard events.

If you mean Iglu client, I’m not 100% sure actually, as it’s a long time since I’ve dealt with this kind of thing - but as far as I remember you need to set one up if you’re using Iglu Server as distinct from whatever’s in the pipeline already. The resolver just points to an Iglu instance, so I think the out of the box pipeline works with no client, and I think client is only required for Iglu Server rather than static repo as you describe below, (but please don’t be mad if I’m wrong :smiley: ). I was confused here - clarified below.

Can I use S3 to host my schema? Either by specifying the file URL or by making the bucket public website?

Yes, the older versions of the pipeline worked this way - here’s the docs on static repo setup.

Best,

You absolutely can use an public website S3 bucket to store your schema’s it just needs to follow a pretty specific folder structure to work. I found the igluctl documentation to be most helpful when I was just getting started.

The Scala Stream Enricher does have an Iglu client built in as you assumed. The resolver configuration is how it decides what repositories to use to validate schemas. You will need to update your resolver configuration to add a repository that points to your public S3 bucket.

Happy to provide any additional help you need, or some sample files that I used to get started if you want.

2 Likes

Thanks so much, sorry I totally missed that in docs. Brain must have been fried.

I’ve got self describing jsons writing out into Elasticsearch Hooray, however, we are getting really long names. For instance, my self describing event has: data : { industry : { type : "string" }}, however, the name in elasticsearch is unstruct_event_com_submittable_snowplow_submittable_event_1.industry which is a tad unwieldy. Any idea how to short circuit that to be a shorter label like industry?

Thanks!
Patrick