Snowplow Avalanche released for Snowplow load-testing


#1

Hi everyone we are very excited to announce the release of our load-testing project Avalanche.

This project is aimed at creating a standardised framework for testing Snowplow batch and real-time pipelines under various loads. It will hopefully also expand ours and the community’s knowledge on what configurations work best and to discover (and then remove!) limitations that we might come across.

If you have any questions about the release, please post to this topic! Please also share any experiences you have with Avalanche or any ideas for other simulations and tests that we could include.


#3

Hey @josh -

Avalanche is pretty awesome - checking it out, looks like its a scala bulk http event emitter (using gatling) - but seems to have quite a bit hard coded for v1 (as seen in line 52-54: https://github.com/13scoobie/avalanche/blob/master/src/main/com/snowplowanalytics/avalanche/ExponentialPeak.scala#L52).

I was wondering if you have a roadmap of features, or could do any callouts for whats to come with it?

Also, i noticed it has highcharts built in, which is awesome ( great product :slight_smile: ) - but that then means anyone who uses it for their company should buy a license to use it.

I had a few ideas to make it more dynamic:

  • Do look ups to iglu schemas to create more than just pageview/structured events
  • Create a type randomizer (string generator, int, etc) - possibly grabbing actual data based on past realtime archives (query elasticsearch?)

Would be very interested to hear your thoughts, and what items you think would be best to include/build upon.


#4

Hey @13scoobie,

Under the hood it is entirely Gatling Highchart’s - the only piece here that is custom to Snowplow are the scenarios in the form of an Exponential and Linear peak.

These are written using Gatling as well - if you wanted to add your own you would simply need to write some Scala code with the scenario that you wanted to test!

Would be very interested to hear your thoughts, and what items you think would be best to include/build upon.

What we can do at the moment with Avalanche is test very high, predictable scale. An event randomiser and a sending randomiser would be the next items which would be best to build next.

What you have suggested for the event randomiser sounds spot on - we need to use Iglu Central schemas and some form of value generator to get a large variety of events coming in. A neat addition here would then be to add the ability to add your own custom Iglu to the pool to generate an even larger variety of events.

To cap this off we would then need some way to vary the number and type of event contexts attached to each event.

The other side would be to vary using GET and POST requests and for the later the number of events per request as this can have interesting side affects downstream in processing.

Roadmaps

At the moment we do not have any definitive roadmap for this project but PRs are always welcome to explore these ideas or anything else you would like to see included in the project!