Kinesis Stream Enrich


We want to know how does kinesis stream enrich work internally? Is its working different from emretlrunner? In the case of EMR we can scale it by increasing instances but if kinesis stream enrich is used to perform enrichments then how can we scale it?


Spark Enrich and Scala Stream enrich share some code in common (scala-common-enrich) but they fundamentally scale in different ways. For Kinesis stream enrich you’re going to want to scale up shards which will increase both your write and read throughput - you’ll typically want to match the number of KCL (Kinesis Client Library - stream enrich uses this under the hood) workers to how many shards you have.

There’s a nice example of the relationship between record processors, workers, instances and shards in the AWS documentation here.