Thanks for the clear reply. Right now, I'm starting to look at Snowplow Analytics pipeline as an alternative to DIY solution. It looks like snowplow platform is very modular and at least the Schema Registry (Iglu) can be deployed outside of the pipeline's context. I'm evaluating whether it is feasible for me to deploy the registry without the rest of the pipeline and what the available interfaces/tools are. Schema registry is higher priority than rest of the pipeline.
So considering the standalone use of Iglu, I'd need some way of interacting with it from our stack. Python is the primary and currently only language we use for implementing our pipelines including pyspark on EMR. I would like to understand what the level of effort is with something like Iglu to write and maintain a homegrown client. And later if we choose to implement the Snowplow pipeline, what the effort is for integration.
I'm happy to chat offline if you wish.