config.yml takes an additional_info JSON field, but I'm not sure how I could configure the EMR cluster to include Ganglia.
Can you point me in the right direction?
I don't think you can use the additional_info for this purpose. To be able to add Ganglia to EMR cluster you would have to engage --applications parameter as per http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-ganglia.html.
This can be achieved with Dataflow Runner. More info on the latest release is here: https://snowplowanalytics.com/blog/2017/03/31/dataflow-runner-0.2.0-released/.
To be more specific, the configuration file would need to include the value Ganglia on this line: https://github.com/snowplow/dataflow-runner/blob/master/config/cluster.json.sample#L84
Thanks, @ihor, I'll keep an eye on Dataflow Runner development.