EMR Additional Security Groups


#1

Hi team-snowplow,

I have a Clojure collector and batch pipeline set up in an AWS account dedicated to Snowplow, but the Redshift instance it’s loading into is in another AWS account. When the EMR-initiating instance used to run the storageloader step itself, the cross-account data load worked via whitelisted security group configured on the persistent instance. To replicate that, I believe I’ll need to apply the same group to the EMR slave instance(s), but that means Snowplow needs to set it in AdditionalSlaveSecurityGroups[1] when launching the EMR cluster, right?

Is there some config that will enable me to pass this specifically, or a general EMR config that gets passed down through?

[1] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-additional-sec-groups.html


#3

Between network and IAM permissions required for rdb_loader, I ended up having to move the Redshift cluster into the same account as the rest of EmrEtlRunner, thankfully a not terribly difficult feat.