What is the minimum viable IAM policy for Snowplow operation?


#1

Not that we don’t trust that our Snowplow installation won’t go rogue, but you can never be too careful with your data, right? The IAM setup page gives a rather permissive policy to get things going, but how much can it be restricted? From a very out-of-date setup, we give our snowplow_operator:

{
    "Action": [
        "elasticmapreduce:AddInstanceGroups",
        "elasticmapreduce:AddJobFlowSteps",
        "elasticmapreduce:DescribeJobFlows",
        "elasticmapreduce:ModifyInstanceGroups",
        "elasticmapreduce:RunJobFlow",
        "elasticmapreduce:SetTerminationProtection",
        "elasticmapreduce:TerminateJobFlows",
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:ListMetrics",
        "cloudwatch:PutMetricData"
    ],
    "Resource": [
        "*"
    ],
    "Effect": "Allow"
},
{
    "Action": [
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:CancelSpotInstanceRequests",
        "ec2:CreateSecurityGroup",
        "ec2:CreateTags",
        "ec2:DescribeAvailabilityZones",
        "ec2:DescribeInstances",
        "ec2:DescribeKeyPairs",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSpotInstanceRequests",
        "ec2:DescribeSubnets",
        "ec2:DescribeRouteTables",
        "ec2:ModifyImageAttribute",
        "ec2:ModifyInstanceAttribute",
        "ec2:RequestSpotInstances",
        "ec2:RunInstances",
        "ec2:TerminateInstances"
    ],
    "Resource": [
        "*"
    ],
    "Effect": "Allow",
    "Condition": {
        "StringEquals": {
            "ec2:Region": "us-west-2"
        }
    }
},
{
    "Action": [
        "s3:GetObject",
        "s3:ListBucket",
        "sdb:CreateDomain",
        "sdb:Select",
        "sdb:GetAttributes",
        "sdb:PutAttributes",
        "sdb:BatchPutAttributes",
        "sdb:ListDomains",
        "sdb:DomainMetadata"
    ],
    "Effect": "Allow",
    "Resource": [
        "arn:aws:s3:::*elasticmapreduce/*",
        "arn:aws:sdb:*:*:*ElasticMapReduce*/*",
        "arn:aws:sdb:*:*:*"
    ]
}
# and S3 stuff for ETL...

Is this the best we can do? Particularly the *s for EMR, Cloudwatch, and SDB seem large, with the EC2 * being marginally better in that it’s restricted to a particular region. (Note that I’m not even sure that these permissions are sufficient for late-model Snowplows, since we’re so far behind the times.)


#2

Hi @alexc-sigfig - here is our wiki page for the Snowplow operator’s permissions: Setup IAM permissions for operating Snowplow.

Snowplow necessarily requires a lot of AWS permissions to run - it is strongly recommended to setup Snowplow in an exclusive AWS sub-account.


#3

This is maybe more of an AWS question than a Snowplow question, but which resources need to be under the same AWS account? Can I get away with setting up a new collector and emr-etl-runner (and their associated S3 buckets) in a sub-account, but send the data into Redshift owned by another AWS account?


#4

Yes sure - you can keep Redshift in a separate AWS account (many of our customers do).