I wanted to ask if anyone has experience in implementing auto-scaling solutions for Kinesis shards using AWS real-time architecture. The implementation of scaling the shards and consumers is easy. The problem which I wanted to discuss is managing intermediary shards created after scaling the Kinesis stream.
Let’s look into an example:
A user has load balancer facing stream collectors that write records to a Kinesis data stream (8 shards) with data retention period of two days. Data is consumed from this stream by scala stream enrich application (2 nodes, 2 consumers in each node. Each consumer gets 2 shards).
Now, suppose the traffic goes up and we scale Kinesis stream to have 12 shards. Now the stream has 28 shards in total (8+8 closed and 12 active) and DynamoDB holds leases for these 28 shards until the data expires for 16 closed shards, after which, the intended 12 leases remain. Has anyone dealt with scala stream enrich consumer rebalancing between the given 28 shards (ideal situation would be that the 12 shards that are active would be spread between running consumers as evenly as possible (of course you can’t do that with 8 consumers in our example)).
The answer, that this is impossible to do would also help