Bulk import of old events into Snowplow from Apache Kafka

We are doing a POC to switch to snowplow from an older system that we currently have in place. We have a bunch a data sitting in kafka from the old system that i would like to pipe into snowplow so that our data analysts can go to a single place for all their reporting needs.

I am planning on writing a utility app to consume data from kafka and send it thru snowplow. I wanted to get the community’s thoughts on doing this.

Do the collectors have an API spec that I can follow? Swagger docs maybe?

I believe i still might be able to pump in the events using the protocol specified here https://github.com/snowplow/snowplow/wiki/SnowPlow-Tracker-Protocol

I am thinking the url to send it should be http://[collector_host]/i based on the requests i am seeing go thru in the miniplow example events

What are your thoughts? Are there better ways to do this?

Hey @vish - that sounds like a nice approach. What language are you planning on writing the utility app in?

prolly java since the the java library for kafka is pretty nice

Would you know if there is a better API for this? Maybe something that supports sending in multiple events in at the same time?

Hi @vish
You could also utilise the Snowplow Java Tracker to send the events to your collector within your utility application. This would remove the need to worry about the Snowplow Tracker Protocol as the Tracker will do this for you.
You can also utilise the BatchEmitter which will allow you send the events in batches to the collector.

2 Likes