Server-side infrastructure

Hey all!

We’re trying to figure out a good infrastructure for sending server-side events to Snowplow. Our tech team is concerned about sending events directly as the HTTP requests are blocking, and might have an impact on the application itself in case of latency or availablity issue with the collector. As analytics/trackings are orthogonal to the application, it makes sense to avoid this extra point of failure.

This is the infrastructure we’re currently evaluating:

  • Application: Write events to a local events.log file
  • Fluentd: Send out the events to a SQS queue
  • Logstash: Get messages from SQS and send them to the collector, through a plugin that uses the Snowplow gem

It would be nice to hear your feedback about this :slight_smile: How have you structured your applications to send data to Snowplow?

Thanks!
Bernardo

Hey @bernardosrulzon - great question. What languages/frameworks are you looking to support server-side?

@alex Mostly Ruby applications (Rails and Daemon Kit)

Hi @bernardosrulzon - did you look at the Ruby Tracker’s AsyncEmitter?

Are your tech team dead-set on there being a process boundary and intermediate queue between the tracking site and your emitting code?

Thanks, @alex!

We’re open minded on the subject - the thing is that the event stream coming from the Ruby applications might be consumed by a variety of clients. The fluentd layer would be responsible for allowing this “parallel processing” of events.

…but Snowplow itself could be the centralized log, if we switch to the real-time pipeline, and other clients could consume events coming out of the collector “good” stream.

Any thoughts/best practices on that?