Iglu JSON caching


#1

Hello,
First-time Snowplow/Iglu user here.

My main goal is to setup Snowplow for production use inside company. As I am playing around I was thinking of using s3 as a static repo for Iglu Repository. I am very interested to build a robust and highly available and cheap(!) solution.

I am wondering if every incoming event is validated by the Iglu Schema Validator. Does this mean that the load of the static server will be proportional to incoming data? Do we do any kind of caching?

I am not very familiar with Scala, but I have searched Enricher code and Iglu client repository trying to find if we do some caching. Haven’t found anything.

Do you think that this affects performance? Should we use like a loadbalancer or is it overkill?

Please share your experience and insights of Iglu load requirements.

Thank you.


#2

Hey @alexopoulos7,

Sure thing, Scala Iglu client uses cache! Its size can be configured by cacheSize setting in resolver configuration and TTL by cacheTtl. Under the hood this is LRU cache, which I believe is most efficient approach here.

Most likely, your registry will receive as many HTTP requests as many schemas you have in dataset (plus few auxiliary schemas * number of nodes), which is usually very small amount, so I don’t think this can be a real performance concern.


#3

@anton beat me to it!

If you’re interested in the logic of the LRU cache it lives here.