I can imagine most of the production Snowplow / Redshift users here had similar thoughts about this topic so I thought I’d see if I can learn from anyone’s experience.
We currently run Redshift on a 4 dc1.large cluster. We are currently bound by disk space (so we archive fairly aggressively) and by CPU (heavy queries will max out the CPU on all instances).
I think we will need about 3-4 times more disk space than we have now and some more CPU capcity.
The thing is, giving the options and the cost of the instances, I’m not sure if we’re better off moving to say, 12-16 dc1.large instances or maybe 3-4 ds2.xlarge or even a single dc1.8xlarge. All of which would be approximately the same in terms of pricing. I know Redshift is meant to be linearly scalable but there’s always a tradeoffs between having more, weaker machines and having less stronger machines.
Anyone can advise based on past experience?
Thanks in advance