Problem with load testing using avalanche

Hi,

We decided to do a load testing exercise on our collector using avalanche. We used the following configuration for the simulation :

export SP_SIM_TIME=60
export SP_BASELINE_USERS=3500
export SP_PEAK_USERS=0

We ran the LinearPeak simulation. But as soon as we started the simulation the requests started to drop. We got the following error :

REQUEST Baseline 682 PageView Event 1479297981539 1479297981543 KO status.find.in(200,304,201,202,203,204,205,206,207,208,209), but actually found 503

After simulation ends we checked that total number of requests were around 9 million and around 5000 requests had above mentioned error.

But after running EMR and loading data into redshift we saw that only 0.13 million events came into table.

Our environment had autoscalling enabled and we were using m4.large type of EC2 instance.

What could be the reason that caused this problem?

Hi @rahul,

First off the idea behind the baseline and peak users is to provide a ramping mechanism so your collector does not go from nothing to 3500 events per second - this instantaneous loading does not provide a real world way of seeing how your autoscaling group and instances ramp up to deal with the traffic.

In the future I would recommend something more along the lines of:

export SP_SIM_TIME=60
export SP_BASELINE_USERS=1000
export SP_PEAK_USERS=2500

This will then provide a smooth ramp up and down to simulate your peak activity - which should then be what is triggering your autoscaling.


Onto your actual questions! The 503 response code indicates that the server is overloaded and unable to accept any more requests.

  • What was the state of the cluster during the load test?
    • Can you provide any metrics around CPU use / Network use etc.
  • What metrics are you using currently to scale up?
  • What is the size of the backing EBS volume of the instance?

Once we have a bit more context we should hopefully be able to figure what exactly is going wrong!

Cheers,
Josh