Compute profiles of Scala Collector & Enricher


#1

Hello fellow Snowplowers,

I’m setting up an ECS Cluster of Scala Collectors & Enrichers, and trying to optimize my resources. What did you find would be the optimal “generous” configuration for both the collector & enricher? Currently, I give 2 cpu cores and 512 mb each - but I wonder whether they benefit at all from the two cores. Are there any official recommendations regarding machine size?

Thank you,
Victor.


#2

Is this a poor question or a trade secret? :slight_smile: Has anyone done the work to optimize the cost/benefit from the underlying iron on the streaming components?


#3

Hey @vivricanopy on the Stream Enrich side (or any stream consumer) we tend to think about in terms of 1 shard needing 1 vCPU. As you scale up or down your Kinesis Stream you will then need add/remove vCPUs to remain optimal.

As for un-official recommendations for instance types:

  1. For the collector anything with a moderate to high network performance to ensure your latency is low - we find that with any decent load on the lower end instances latency quickly climbs to the ~40-50 ms mark, in general we tend to use m3.xlarge for high load and m3.large/medium for lower loads.

  2. Most of the same rules apply for the Enrichment - you are going to be reading and writing a lot of data to Kinesis. The thing to note here is that depending on how many enrichments you are going to be using you can drastically increase your compute demands.


In short there is no generic rule for what instances to use - every pipeline has different requirements so a lot of it will be figuring out what suits your load best and what latency you are happy to deal with in the pipeline!

Hope this helps,
Josh


#4

Thank you @josh! This was immensely helpful. I’m trying to profile the ECS cluster for optimal usage. Currently throwing a c4.large at each instance - I think it’ll do then :slight_smile: