I’ve setup an events pipeline, something similar to the lambda architecture here: Is my version of snowplow lambda architecture correct
I have a scala stream collector, writing into kinesis stream. The stream has 20 shards.
This stream has 2 consumers - stream enrich and kinesis firehose.
When I run load test with about 700 request per second, I get provisioned read throughput exceeded alert from AWS and I feel that it shouldn’t happen with 20 shards.
20 shards mean that the consumers can consume up to 40MiB per second, in total.
I really don’t think that I reach this, and I don’t get write throughput exceeded alert while the collector is allowed to write in a rate of 1MiB per second (half of the allowed the read rate).
Another read limit is 5 requests per second per shard, so I suspect that the consumers try to read in a higher rate.
Another interesting thing is that the scala stream enrich write to a kinesis stream with two consumers as well - elasticsearch loader and s3 loader. This stream has only 10 shards. But here I don’t get any alert on the read operations.
Did anyone run into this issue or have any idea what could be the cause?
BTW, I checked with AWS, kinesis firehose doesn’t support enhanced fanout at the moment…