Kinesis-S3 is taking too much time to store events into S3 buckets

Hi all,
i am following bellow step to process data

JavaScript Tracker --> Scala Stream Collector --> Stream enrich --> kinesis S3 --> S3

my query is do i need simultaneously run both scala stream collector and Stream enrich steps in my CLI inorder to push the events from kinesis stream (kinesis S3) to S3 bucket?

because when i run command to move event from kinesis S3 to S3 bucket it is taking more time to complete and sometimes it will run within minutes.

Please let us know the overall detailed structure how it is designed.
Because we are failing to run the process in most of the times.

Hi @shashi,

Do you use Kinesis or stdi/stdout/stderr? If you are using Kinesis, do you use EC2 for Kinesis S3 storage? If not, maybe transfer takes to long? Maybe different region would help?

Thanks for the reply @grzegorzewald

Firstly we are using kinesis stream and in EC2 instance only we are using kinesis-S3.

We have create all streams, kinesis s3 and s3 buckets in same region i,e us-east-1.

On an approximate basis how much time it will take to transfer?

do i need to change the regions of all these 3 sections ?

The realtime components are always-on components, i.e. they should always be running so that when the enriched events land in the enriched stream they are directly consumed by the s3 loader and pushed to s3.

1 Like

Hey @grzegorzewald/@BenFradet please do reply for my query.

Using different regions increase transmission delays.

I have never been thinking about turning on and of parts of stack. In general - if you are sure, you won’t lost, you may leave. But not: Kinesis has limited throughput and EC2 has limited network bandwidth. Connecting the two with back off policy may lead to slow processing and data loss in worst case.

Real time data line was not designed to run in batches. If you want batches, change approach. If you want to use POST for requests, use kinesis collector -> Kinesis -> raw S3 storage. But still: both need

1 Like