Kafka connect S3 sink docker

Hi, not sure if this is the right platform to ask this help.

But I’m trying to pull a kafka connect docker image to make it work for s3 sink.

Use case is, I’ve enriched events in kafka, and wants to push those events every 15min. or 1000 events (whichever occurs first) to S3 bucket, and from there I’ll use EmrEtl to load to Redshift.

I’m failing, rather have no knowledge of how to start and where to start. so please help me with a docker file(not docker-compose) along with config to setup.

thank you

bump up.

any leads, appreciated.

Hi pramod,

Apologies we missed this the first time round.

Kafka Connect isn’t a Snowplow product so I’m not sure we can give you much assistance with it. Confluence have extensive documentation on it, so I imagine one of their forums might be more a fruitful avenue for questions specific to Kafka Connect.

On the overall architecture, I will mention that we don’t currently support a Kafka → S3 → Redshift architecture. I think what you propose is workable, however I wouldn’t necessarily expect it to ‘just work’ out of the box. The format and filestructure in S3 needs to match that produced by the S3 loader (which loads from Kinesis to S3), so more steps may be needed to get the data into shape before the loader would work.

The easier way to get data into Redshift of course is to go with the Kinesis-based pipeline, but I do understand that there are reasons one would prefer Kafka.

I hope that’s helpful.