AWS EMR validation error with EmrEtlRunner


#1

When I run:

./snowplow-emr-etl-runner --config ./config.yml --resolver ./resolver.json --start 2016-05-08

I get this error:

D, [2016-05-13T10:32:23.605000 #4125] DEBUG -- : Initializing EMR jobflow
F, [2016-05-13T10:32:27.299000 #4125] FATAL -- :

Press ENTER or type command to continue
ArgumentError (AWS EMR API Error (ValidationException): 1 validation error detected: Value null at 'instances.placement.availabilityZone' failed to satisfy constraint: Member must not be null):
    /Users/..mypath../snowplow-emr/snowplow-emr-etl-runner!/gems/elasticity-6.0.7/lib/elasticity/aws_session.rb:33:


in `submit'

any ideas? let me know if more information is needed.


#2

I added “us-east-1” to the placement option in config and the error changed to:

ArgumentError (AWS EMR API Error (ValidationException): Specified Availability Zone is not supported.):


#3

Update for people who find this:

I had to use “us-east-1a”


#4

Hi @dyerw,

Glad you figured it out.

The AWS availability zone is sometimes confused with the AWS region.

Amazon EC2 is hosted in multiple locations world-wide. These locations are composed of regions and Availability Zones. Each region is a separate geographic area. Each region has multiple, isolated locations known as Availability Zones.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html

The last character like a, b, c is added to the region as an indication of the availability zone of that region.

While s3:region and emr:region keys are self-explanatory, the emr:placement: requires an availability zone (as opposed to region) and it complements the emr:region.

Additionally, only one key has to be used: either placement: or ec2_subnet_id: but not both. It depends on whether the app is running in VPC or not.

Regards,
Ihor