Support for multiple emitters in the mobile trackers


#1

I came across this thread on the old user list regarding support for multiple emitters: https://groups.google.com/forum/#!topic/snowplow-user/6ELjJPGRPjU

Any update? I would like to do the same thing: send data to both a realtime and batch collector for one of the workflows I’m attempting to support.


#2

Hi @dcow - no update on this currently, it somewhat fell off the radar. I’ve created a meta-ticket to track the feature:

Add support for multiple emitters #2867


#3

So the interim solution is just use multiple trackers?


#4

Hi @dcow unfortunately due to how the Emitter objects persist data it is not possible to run more than one Tracker at a time. They are all hardcoded to point to one database and one table within that database. There is also no metadata stored with the event to know which Emitter to send it to.


#5

Hi @josh,

What’s the update on this? Is there any time line for supporting multiple emitters form mobile tracker?

And if not currently supported… then what should be the best way to send some events to batch pipeline and some to real time pipeline?


#6

Hi @rahul - there’s no timeline for this support at this time.

Have you tried defining a second tracker instance - what’s the roadblock that you hit doing this?


#7

Hi @alex how does creating 2 instances help? I need to track 2 types of events, 1 real time and other for batch processing. For this I am planning to have 2 collectors. 1 clojure and other scala stream collector.

To do this I need to have 2 emitter objects pointing to 2 different collectors. Correct me if I am wrong.

What is your suggestion for any workaround in this case?


#8

Hi @alex,

Your inputs will be very helpful on this. As we are using snowplow heavily, and our use case demands sending events to both batch pipeline and real time pipeline separately. We will be sending ~100 million events everyday to each pipeline (i.e. 200 million event/day in total).

It would be great if you can help us with any workaround for this, also we would love to have this feature (allowing 2 emitters to send events to 2 different collectors) implemented from snowplow in their stack.

There one more question you can help me with. For real time events we are planning to use Kafka instead of Kinesis.

As the snowplow documentation of Snowplow 85 Metamorphosis says that Kafka support is in Beta :

  • Is it still in Beta? Can we use this on production ?
  • If yes, then how many companies (it would be great if you can name a few) using this on production?

These answers will make us move ahead with confidence on this implementation.


#9

Hi @rahul - the Snowplow Kafka support is still relatively immature compared to our AWS batch and real-time pipelines. Please do share your findings as you roll-out your deployment through testing into production.

We’re not aware of a workaround - I think the most straightforward solution here is for the multiple emitters to be supported by the trackers.


#10

@alex thank you for your reply. We will sure share our findings here. We will experiment with multiple emitters sending data to multiple collectors and also with the snowplow kafka pipeline.
Also how can I raise the request to get this feature implemented in snowplow stack? I believe later or sooner many others will also feel the need of supporting multiple emitters to be able to send events to multiple collectors.


#11

Hi @rahul - we’ll reach out separately to discuss the multiple emitters feature with you.