RT pipeline - Kinesis stream read throughput exceeded

#1

Hi,

I’ve setup an events pipeline, something similar to the lambda architecture here: Is my version of snowplow lambda architecture correct

I have a scala stream collector, writing into kinesis stream. The stream has 20 shards.
This stream has 2 consumers - stream enrich and kinesis firehose.

When I run load test with about 700 request per second, I get provisioned read throughput exceeded alert from AWS and I feel that it shouldn’t happen with 20 shards.
20 shards mean that the consumers can consume up to 40MiB per second, in total.
I really don’t think that I reach this, and I don’t get write throughput exceeded alert while the collector is allowed to write in a rate of 1MiB per second (half of the allowed the read rate).

Another read limit is 5 requests per second per shard, so I suspect that the consumers try to read in a higher rate.

Another interesting thing is that the scala stream enrich write to a kinesis stream with two consumers as well - elasticsearch loader and s3 loader. This stream has only 10 shards. But here I don’t get any alert on the read operations.

Did anyone run into this issue or have any idea what could be the cause?

BTW, I checked with AWS, kinesis firehose doesn’t support enhanced fanout at the moment…

Thanks.

0 Likes

#2

What do your Get Records and Read Throughput Exceeded Cloudwatch metrics look like?

20 shards is quite high for a 700 requests/second on the write side so it sounds like you’re probably jut hitting read limits from Firehose.

0 Likes

#3

Thanks @mike,

This is probably the case. I talked to AWS support and it seems that I reached 6 get records per second with those two consumers.

I think that more shards is the only solution at the moment, until kinesis firehose and/or stream enrich will support the enhanced fanout feature.

0 Likes