EMR Additional Security Groups


#1

Hi team-snowplow,

I have a Clojure collector and batch pipeline set up in an AWS account dedicated to Snowplow, but the Redshift instance it’s loading into is in another AWS account. When the EMR-initiating instance used to run the storageloader step itself, the cross-account data load worked via whitelisted security group configured on the persistent instance. To replicate that, I believe I’ll need to apply the same group to the EMR slave instance(s), but that means Snowplow needs to set it in AdditionalSlaveSecurityGroups[1] when launching the EMR cluster, right?

Is there some config that will enable me to pass this specifically, or a general EMR config that gets passed down through?

[1] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-additional-sec-groups.html


#3

Between network and IAM permissions required for rdb_loader, I ended up having to move the Redshift cluster into the same account as the rest of EmrEtlRunner, thankfully a not terribly difficult feat.


#4

I know this is old one, but may be useful for anybody here - I have Snowplow stack in one account and Redshift in another.

EMR is placed in VPC rather than region. Than we have VPC Peering set up between accounts with configured routes. Redshift cluster has security group accepting connections form ip address rather than SG as EMR sg is managed and not known apriori.

If anybody needs more details, feel free to ask.

Cheers.