Configuring the collector is all very dependant on the amount of traffic you are expecting, hence there is no defined ruleset for how to configure the collector. It is unique to what you expect to collect.
However for scaling the collector we generally recommend the following:
- Autoscaling Group with
m3.* instances, avoid
t2.* as they can be throttled and remove the ability to effectively scale based on CPU.
- Place this ASG behind a Load Balancer.
- Scale the collector ASG based on two metrics:
- CPU usage: we tend to scale up at 60% utilisation with step scaling at values greater than 85%. In that if it is greater than 85% we provision two extra nodes rather than 1.
- Latency scaling: The Load Balancer provides metrics around the latency to the collectors, if this is very high chances are your collectors are overworked and you need to add nodes to provide a good quality of service. This is highly variable based on the node type you pick, m3.mediums tend to have quite high average latency whereas an m3.xlarge will happily stick below the 5-10ms mark under load.
I would recommend that you setup the architecture as described here with an ASG and Load Balancer and add very basic rules for scaling based on CPU. You can then start tuning the rules to work best for you so you end up with a stable cluster that scales appropriately.
If you can share some usage scenarios we can better help you define rules for scaling!
Hope this helps,