I was wondering what is the 2018 recommended way of setting up EMR Etl Runner for enriching events from Clojure collector. There is this thread from 2016 Should I use different EC2 instance types for EMR besides the default? but I believe things some things might have changed since.
So what is the current recommended instance type for
- master node
- core instances
- task instances
- what is the general workflow for figuring out the number of core+task instances? i.e how to recognize that my emr cluster is over/underpowered?
In my case, the gzipped hourlly tomcat logs on s3 are ~2mbs in size (maybe about 15k events/hr?), on average. I think this is quite a small amount.