We use Snowplow Collectors between two data centers, and therefore we have to pay for outbound bandwidth from one of our data centers, which is our biggest cost.
From what I understand, Snowplow Stream Collectors accept inbound connections via JSON, and then use Thrift to pass the message into the data sink.
We’d like to optimise the data connection that emits the JSON requests on the outbound platform, and an obvious optimisation is to move from JSON to a more compact format like Protobuf or Thrift.
Has anyone optimised the data collection in this way? One possibility is to move the Snowplow data collector to the outbound platform, and then the Thrift message gets sent over the wire to the sink on the inbound platform.
Any advice and feedback is most welcome! And in the short term, we unfortunately cannot consolidate our data centers.