Architecture question

Hi,

i have a few questions concerning the AWS architecture:

  1. Is it a good idea to run the collector and the enricher on the same EC2 instance or should i rather configure 2 separate ones?

  2. Which linux type is best suited for running the collector and enricher or does it not really matter?

  3. Which ressources should i give an EC2 instance for the collector and the enricher?
    Im running them in the moment on a t2.large system. I know that depends at the end on the size of the traffic - i am interested though which process needs more cpu, ram or network performance.

Thanks a lot,
John

Hi John,

We’ve been running Snowplow for 6 months in production and we process about 50M events a month, mostly as click-stream events.

From our experience:

  1. It’s best to separate the two. First the enrichment process caches schemas and you might need to restart the server in case schemas change and you wouldn’t want to restart the collector at the same time. It’s also best to put the collector behind an ELB with at least 2 ec2 for safety. Once your events are in the collector stream, they can easily be replayed for enrichment, but if they don’t reach the first stream, they are likely lost forever.

  2. We have not tried anything else than Ubuntu but the setup is very easy and we have not had any issue with it.

  3. We found that the collector is very lightweight and can run on a t2.micro at our scale. However the enrichment process is quite memory-hungry and we had to scale our t2.small to a t2.medium. CPU-wise it might depend if you run cpu-intensive JS functions to enrich your events.

Our scale is quite humble so I hope our experience can still shed some light on your questions.

Best,
Arthur

Hi Arthur,

thanks a lot for that insight, that helped a lot!

Cheers,
John