Hey @mike - thanks for raising this, it’s a really interesting idea.
As I understand it, anything that could be done across Scala Stream Collector plus Stream/Spark Enrich (let’s call this “status quo”) to implement a webhook could be done in an upstream Snowplow webhook proxy (call this “proxy”) - in other words, the two approaches are functionally equivalent. If I’ve missed something that only a proxy could do, shout!
But assuming they are functionally equivalent, it then becomes a question of which is preferable.
Here are the pros and cons I came up with:
+ Allows webhook adapters to be written in more languages
+ De-couples webhook releases from snowplow/snowplow releases
+ Prevents webhook-specific functionality from bleeding into Scala Stream Collector (a subset of webhooks)
- Increases code complexity (a proxy app is more complex than the adapter files in the status quo)
- Leads to fragmentation of Snowplow webhook support (multiple competing/conflicting implementations, including non-open-source ones, of the Acme webhook for Snowplow)
- Increases fragility - because you are adding a second ack-less step between the source system and the durable raw event stream [unless you “cut out the middleman” and the proxy writes directly to Kafka/Kinesis/Pub/Sub etc]
- Makes it impossible to support webhooks out of the box in Snowplow Mini
Status quo approach
+ Very simple to deploy - vanilla Snowplow/Snowplow Mini install supports N webhooks. No proliferation of web servers unlike proxy approach
+ More robust - no additional moving parts
+ Well standardized - Snowplow supports N blessed webhooks. Every Snowplow user gets the same events from Acme
- Writing webhook adapters in Scala is intimidating/high barrier to entry
- Over time Scala Stream Collector will have to grow awareness of certain webhooks’ behavior
- Testing is more complex (especially when we add webhooks that require Scala Stream Collector changes)
I’m sure I’ve missed some strengths and weaknesses of both approaches - look forward to further thoughts on this! I think what’s interesting though is that there may be a “tipping point” around the complexity of a specific webhook provider (particularly around authentication/authorization), where it then makes sense to go the proxy route rather than the status quo.
I’d be really interested to see one of the most complex webhooks (Xero?) implemented as a proxy.
I’m also keen to consider whether there’s a “third way”, which combines some of the benefits of both approaches. For example:
- We know that writing Scala code is a negative for a lot of possible webhook contributors, but actually webhook implementations all follow quite a fixed set of rules - so could we provide a domain-specific language to make it much easier to write these webhooks?
- We know that testability of these webhooks is important - should we be coming up with an “Avalanche for webhooks”, which makes the roll-out and continuing integration testing of these webhooks much less painful?
- We know that the webhook adapters are not particularly well isolated from the snowplow/snowplow mono-repo - what if we extracted each of these as a library, with its own release cadence? What if we turned the webhook adapters into fullblown Java modules?
One thing that will definitely help with the build-out and ongoing maintenance of webhook support will be the addition of schema inference - @anton and @BenFradet are working on the RFC for this and will share in due course.
Look forward to the community’s thoughts on all this!