Kinesis Stream Enrich

Dev · December 3, 2018, 2:54am

We want to know how does kinesis stream enrich work internally? Is its working different from emretlrunner? In the case of EMR we can scale it by increasing instances but if kinesis stream enrich is used to perform enrichments then how can we scale it?

mike · December 3, 2018, 8:58am

Spark Enrich and Scala Stream enrich share some code in common (scala-common-enrich) but they fundamentally scale in different ways. For Kinesis stream enrich you’re going to want to scale up shards which will increase both your write and read throughput - you’ll typically want to match the number of KCL (Kinesis Client Library - stream enrich uses this under the hood) workers to how many shards you have.

There’s a nice example of the relationship between record processors, workers, instances and shards in the AWS documentation here.

ykim · June 11, 2019, 11:20pm

Hi Mike,

So… if we have 200 shards on Kinesis stream that is used by enrichers, do I need 200 enricher instances to perform at best(1 enricher instance == 1 KCL worker)? Is there a way to increase the number of KCL workers from each enricher instances in EC2 instance?

mike · June 11, 2019, 11:54pm

One worker can process more than one shard so you generally want to have fewer instances than shards, exactly how many will depend on throughput, message size, desired latency etc.

Topic		Replies	Views
Making the Stream Enricher Highly Available (autoscaling group) Enrichment	12	3258	November 10, 2016
Speeding up Stream Enricher Enrichment	12	3190	August 3, 2018
Scala Stream Enricher behavior when splitting/merging kinesis shards Enrichment	4	1619	January 18, 2017
Stream Enrich in Kubernetes cluster AWS real-time pipeline	4	1750	April 12, 2019
Scala Stream Collector - scaling Collectors	7	3274	January 25, 2017

Kinesis Stream Enrich

Related Topics