Snowplow on ARM based CPUs

mrchief · October 30, 2020, 6:29pm

Does Snowplow components like Stream Collector and Stream Enrich work on ARM based CPUs? I see that JDK 8 works on ARM so is it safe to think that these apps would also work on ARM?

Has anyone tried this?

mike · November 2, 2020, 2:02am

Yes - you can run most of the Snowplow components on ARM based CPUs (e.g., a Raspberry Pi) with the caveat that these components haven’t been designed with this use case in mind - so you’re likely to hit some speed and resource constraints depending on what you are planning.

In general, and if you can, run one of the trackers on the ARM device that sends data to a collector endpoint that is already running in a cloud - it’ll be far more reliable and be able to scale better than anything running on limited hardware.

mrchief · November 2, 2020, 3:53pm

Thanks Mike!

I just finished deploying scala collector to ubuntu 20.14 arm64 and running on a AWS Graviton instance. Everything seems to be working fine AFAICT. The trackers are not a concern for us since they are going to be client/user dependent anyway. Going to test Enricher next.

Could you elaborate or point me to any potential speed issues that you mentioned? Or is it more of a general observation? AWS Graviton instances are newer and more powerful and are cheaper than Intel/AMD based instances. I’m wondering if that would offset the speed issues?

Currently we’re running ~20 t2.micro collector instances and I’m planning to go to t4g.small. Enricher is currently a c5.4xlarge which I’m planning to replace with a r5d.2xlarge (which is memory optimized).

We’re also planning to implement some sort of autoscaling for Enricher as that is our current choke point. We’re handling ~100million events per day with some spikes around few hours each day. It is during these spikes is when we need the extra enricher capacity, otherwise a c5.large also seems to be sufficient.

mike · November 2, 2020, 9:12pm

This should be fine - this comment was mostly in the context of low power ARM CPUs as opposed to to the ARM-based CPUs that Amazon has developed.

I’d typically go for fewer, more expensive instances and autoscale on these rather than provisioning a large number of small machines. t4g will end up cheaper but depending on your traffic patterns you may need to deal with CPU bursting credits and a slower network performance.

mrchief · November 2, 2020, 9:35pm

This should be fine - this comment was mostly in the context of low power ARM CPUs as opposed to to the ARM-based CPUs that Amazon has developed.

Ah cool, that makes sense and is a relief.

I’d typically go for fewer, more expensive instances and autoscale on these rather than provisioning a large number of small machines. t4g will end up cheaper but depending on your traffic patterns you may need to deal with CPU bursting credits and a slower network performance.

I was wondering the same. We have bursts mainly during the morning. I’m looking into traffic patterns and scaling patterns to find out more.

Do you know if collector is CPU or memory intensive? And enricher?

mike · November 2, 2020, 11:52pm

The collector tends to be more CPU intensive (depending on your buffering settings) and the enricher tends to be a mix of memory / CPU depending on what enrichments you are running.

mrchief · November 3, 2020, 8:27pm

I was able to verify Enricher on ARM instance as well!

Thanks for all your inputs and help @mike!

Topic		Replies	Views
Architecture question For engineers	2	647	July 11, 2019
Setting up the real-time pipeline on AWS AWS real-time pipeline	24	5712	May 25, 2021
Snowplow Serverless For engineers	22	4899	February 23, 2023
Low cost stream pipeline	4	1529	January 30, 2020
Components are being removed from GCP? For engineers	3	592	June 20, 2019

Snowplow on ARM based CPUs

Related Topics