Having issues with config.yaml and Contract Violation


#1

Hello, any help here would be extremely appreciated. I have been banking my head around this.

I have been getting this error:

ContractError (Contract violation for return value: Expected: {:aws=>{:access_key_id=>String, :secret_access_key=>String, :s3=>{:region=>String, :buckets=>{:assets=>String, :jsonpath_assets=>#<Contracts::Maybe:0x6cd821c8 @vals=[String, nil]>, :log=>String, :raw=>{:in=>#<Contracts::ArrayOf:0x6192094b @contract=String>, :processing=>String, :archive=>String}, :enriched=>{:good=>String, :bad=>String, :errors=>#<Contracts::Maybe:0x3aa04cf8 @vals=[String, nil]>, :archive=>#<Contracts::Maybe:0x721d4bd9 @vals=[String, nil]>}, :shredded=>{:good=>String, :bad=>String, :errors=>#<Contracts::Maybe:0x615ece16 @vals=[String, nil]>, :archive=>#<Contracts::Maybe:0x172c384b @vals=[String, nil]>}}}, :emr=>{:ami_version=>String, :region=>String, :jobflow_role=>String, :service_role=>String, :placement=>#<Contracts::Maybe:0x5823cfcf @vals=[String, nil]>, :ec2_subnet_id=>#<Contracts::Maybe:0x17204c3e @vals=[String, nil]>, :ec2_key_name=>String, :bootstrap=>#<Contracts::Maybe:0x496d864e @vals=[#<Contracts::ArrayOf:0x57499c0f @contract=String>, nil]>, :software=>{:hbase=>#<Contracts::Maybe:0x358c908b @vals=[String, nil]>, :lingual=>#<Contracts::Maybe:0x1f65b124 @vals=[String, nil]>}, :jobflow=>{:master_instance_type=>String, :core_instance_count=>Contracts::Num, :core_instance_type=>String, :task_instance_count=>Contracts::Num, :task_instance_type=>String, :task_instance_bid=>#<Contracts::Maybe:0x4d50c296 @vals=[Contracts::Num, nil]>}, :additional_info=>#<Contracts::Maybe:0x71172d81 @vals=[String, nil]>, :bootstrap_failure_tries=>Contracts::Num}}, :collectors=>{:format=>String}, :enrich=>{:job_name=>String, :versions=>{:hadoop_enrich=>String, :hadoop_shred=>String}, :continue_on_unexpected_error=>Contracts::Bool, :output_compression=>#<Proc:0x2bc674d8@/Users/goodechilde/Code/emr/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:23 (lambda)>}, :storage=>{:download=>{:folder=>#<Contracts::Maybe:0x5fd73cf4 @vals=[String, nil]>}, :targets=>#<Contracts::ArrayOf:0x2be2e854 @contract={:name=>String, :type=>String, :host=>String, :database=>String, :port=>Contracts::Num, :ssl_mode=>#<Contracts::Maybe:0x6cf82c7d @vals=[String, nil]>, :table=>String, :username=>#<Contracts::Maybe:0x5e899a54 @vals=[String, nil]>, :password=>#<Contracts::Maybe:0x7b28bdf4 @vals=[String, nil]>, :es_nodes_wan_only=>#<Contracts::Maybe:0x468f5346 @vals=[Contracts::Bool, nil]>, :maxerror=>#<Contracts::Maybe:0x6b97436b @vals=[Contracts::Num, nil]>, :comprows=>#<Contracts::Maybe:0x2845b098 @vals=[Contracts::Num, nil]>}>}, :monitoring=>{:tags=>#<Contracts::HashOf:0x3e02f94e @key=Symbol, @value=String>, :logging=>{:level=>String}, :snowplow=>#<Contracts::Maybe:0x353d8fb0 @vals=[{:method=>String, :collector=>String, :app_id=>String}, nil]>}}, Actual: {:access_key_id=>"XXXXXXXXXX", :secret_access_key=>"XXXXXXXXXXXXX", :s3=>{:region=>"us-west-2", :buckets=>{:assets=>"s3://snowplow-hosted-assets", :log=>"s3://elasticbeanstalk-XXXXXXXXX/resources/environments/logs/", :raw=>{:in=>["s3://elasticbeanstalk-XXXXXXXXX/resources/environments/logs/publish/e-dadjwp3i2u", "s3://elasticbeanstalk-XXXXXXXXX/resources/environments/logs/publish/e-qic9enm2vf"], :processing=>"s3://ETL-processing/processing", :archive=>"s3://ETL-archive/raw"}, :enriched=>{:good=>"s3://ETL-out/good", :bad=>"s3://ETL-out/bad", :errors=>nil, :archive=>"s3://ETL-archive/enriched"}, :shredded=>{:good=>"s3://ETL-out/shredded/good", :bad=>"s3://ETL-out/shredded/bad", :errors=>nil, :archive=>"s3://ETL-archive/shredded"}}}, :emr=>{:ami_version=>"4.5.0", :region=>"us-west-2", :jobflow_role=>"EMR_EC2_DefaultRole", :service_role=>"EMR_DefaultRole", :placement=>"us-west-2a", :ec2_subnet_id=>nil, :ec2_key_name=>"YYYYY", :bootstrap=>[], :software=>{:hbase=>nil, :lingual=>nil}, :jobflow=>{:master_instance_type=>"m1.medium", :core_instance_count=>2, :core_instance_type=>"m1.medium", :task_instance_count=>0, :task_instance_type=>"m1.medium", :task_instance_bid=>0.015}, :bootstrap_failure_tries=>3, :additional_info=>nil}} Value guarded in: Snowplow::EmrEtlRunner::Cli::load_config With Contract: Maybe, String => Hash At: /Users/asdfg/Code/emr/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:134 ): /Users/asdfg/Code/emr/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:69:inContract’
org/jruby/RubyProc.java:271:in call' /Users/asdfg/Code/emr/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:147:infailure_callback’
/Users/asdfg/Code/emr/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:164:in common_method_added' /Users/asdfg/Code/emr/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:incommon_method_added’
file:/Users/asdfg/Code/emr/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:37:in (root)' org/jruby/RubyKernel.java:1091:inload’
file:/Users/asdfg/Code/emr/snowplow-emr-etl-runner!/META-INF/main.rb:1:in (root)' org/jruby/RubyKernel.java:1072:inrequire’
file:/Users/asdfg/Code/emr/snowplow-emr-etl-runner!/META-INF/main.rb:1:in (root)' /var/folders/3q/kwzknpfn6g572rs7rsyhndjh0000gq/T/jruby4742705589162990248extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in(root)’`

And I am using this config file:

# Credentials can be hardcoded or set in environment variables access_key_id: XXXXXXXX secret_access_key: XXXXXXXXXXXXXX s3: region: us-west-2 buckets: assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket log: s3://elasticbeanstalk-XXXXXXXXX/resources/environments/logs/ raw: in: # Multiple in buckets are permitted - s3://elasticbeanstalk-XXXXXXXXX/resources/environments/logs/publish/e-dadjwp3i2u # e.g. s3://my-in-bucket - s3://elasticbeanstalk-XXXXXXXXX/resources/environments/logs/publish/e-qic9enm2vf processing: s3://ETL-processing/processing archive: s3://ETL-archive/raw # e.g. s3://my-archive-bucket/raw enriched: good: s3://ETL-out/good # e.g. s3://my-out-bucket/enriched/good bad: s3://ETL-out/bad # e.g. s3://my-out-bucket/enriched/bad errors: # Leave blank unless :continue_on_unexpected_error: set to true below archive: s3://ETL-archive/enriched # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched shredded: good: s3://ETL-out/shredded/good # e.g. s3://my-out-bucket/shredded/good bad: s3://ETL-out/shredded/bad # e.g. s3://my-out-bucket/shredded/bad errors: # Leave blank unless :continue_on_unexpected_error: set to true below archive: s3://ETL-archive/shredded # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded emr: ami_version: 4.5.0 region: us-west-2 # Always set this jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles service_role: EMR_DefaultRole # Created using $ aws emr create-default-roles placement: us-west-2a # Set this if not running in VPC. Leave blank otherwise ec2_subnet_id: # Set this if running in VPC. Leave blank otherwise ec2_key_name: YYYYYY bootstrap: [] # Set this to specify custom boostrap actions. Leave empty otherwise software: hbase: # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise. lingual: # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise. # Adjust your Hadoop cluster below jobflow: master_instance_type: m1.medium core_instance_count: 2 core_instance_type: m1.medium task_instance_count: 0 # Increase to use spot instances task_instance_type: m1.medium task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures additional_info: # Optional JSON string for selecting additional features collectors: format: clj-tomcat # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events enrich: job_name: Snowplow ETL # Give your job a name versions: hadoop_enrich: 1.7.0 # Version of the Hadoop Enrichment process hadoop_shred: 0.9.0 # Version of the Hadoop Shredding process hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP storage: download: folder: # Postgres-only config option. Where to store the downloaded files. Leave blank for Redshift targets: - name: ETLtrackdatabase type: redshift host: XXXXXXXXX.us-west-2.redshift.amazonaws.com:5439 # The endpoint as shown in the Redshift console database: ETLtrackdev # Name of database port: 5439 # Default Redshift port ssl_mode: disable # One of disable (default), require, verify-ca or verify-full table: atomic.events username: XXXXXXXXX password: XXXXXXXXX maxerror: 1 # Stop loading on first error, or increase to permit more load errors comprows: 200000 # Default for a 1 XL node cluster. Not used unless --include compupdate specified monitoring: tags: {} # Name-value pairs describing this job logging: level: DEBUG # You can optionally switch to INFO for production snowplow: method: get app_id: ADD HERE # e.g. snowplow collector: ADD HERE # e.g. d3rkrsqld9gmqf.cloudfront.net

Any help on where I could be going wrong. I am new here so sorry if this is a basic question.

Thank you.


#2

Hi @sevenm,

Your configuration file appears to be fine. However, it is hard to read. Indentation is crucial for YAML files and it’s something I cannot verify. Can I ask you to paste your code in between the triple ticks to maintain the original indentation of your content as shown below?

Before that, I suspect that the issue might be with the last lines of your configuration. You don’t seem to intend to use snowplow section: Your app_id and collector are not filled with the actual values. Can I ask you to either remove the whole snowplow section or comment it out and try to rerun your EMR-ETL Runner.

If still fails, please paste your code again but ensure to retain the original indentation:

```

your
   indented
      code

```

Regards,
Ihor


#3

thanks for the response ihor, I redid the file from scratch and deleted unused portions and it is working now.


#4

I also got “Contract violation for return value”. Following https://snowplowanalytics.com/blog/2017/06/12/snowplow-r89-plain-of-jars-released-porting-snowplow-to-spark/, I downloaded snowplow_emr_r89_plain_of_jars; following https://github.com/snowplow/snowplow/blob/master/3-enrich/emr-etl-runner/config/config.yml.sample,
I updated config.yml as follows

aws:
access_key_id: iam
secret_access_key: iam
s3:
region: us-west-2
buckets:
assets: s3://snowplow-hosted-assets
jsonpath_assets: s3://<%= ENV[‘S3BUCKET’] %>/jsonpaths
log: s3n://<%= ENV[‘S3BUCKET’] %>/etl/logs
raw:
in: [“s3n://<%= ENV[‘S3BUCKET’] %>/raw/”]
processing: s3://<%= ENV[‘S3BUCKET’] %>/etl/processing
archive: s3://<%= ENV[‘S3BUCKET’] %>/archive/raw
enriched:
good: s3://<%= ENV[‘S3BUCKET’] %>/enriched/good
bad: s3://<%= ENV[‘S3BUCKET’] %>/enriched/bad
errors: s3://<%= ENV[‘S3BUCKET’] %>/enriched/errors
archive: s3://<%= ENV[‘S3BUCKET’] %>/enriched/archive
shredded:
good: s3://<%= ENV[‘S3BUCKET’] %>/shredded/good
bad: s3://<%= ENV[‘S3BUCKET’] %>/shredded/bad
errors: s3://<%= ENV[‘S3BUCKET’] %>/shredded/errors
archive: s3://<%= ENV[‘S3BUCKET’] %>/shredded/archive
emr:
ami_version: 5.9.0
region: us-west-2
jobflow_role: arn:aws:iam::330838374348:instance-profile/sp-dev-EMRInstanceProfile
service_role: arn:aws:iam::330838374348:role/sp-dev-EMRServiceRole
placement:
ec2_subnet_id: subnet-79145409
ec2_key_name: sp-emr
bootstrap: []
software:
hbase:
lingual:
jobflow:
job_name: clickstream-<%= ENV[‘SNOWPLOW_ENV’] %>
master_instance_type: <%= ENV[‘SNOWPLOW_EMR_MASTER_TYPE’] %>
core_instance_count: <%= ENV[‘SNOWPLOW_EMR_CORE_NUM’] %>
core_instance_type: <%= ENV[‘SNOWPLOW_EMR_CORE_TYPE’] %>
task_instance_count: 0 # Increase to use spot instances
task_instance_type: m1.medium
task_instance_bid: 0.015
bootstrap_failure_tries: 3
configuration:
yarn-site:
yarn.resourcemanager.am.max-attempts: "1"
spark:
maximizeResourceAllocation: "true"
additional_info:
collectors:
format: thrift
enrich:
versions:
spark_enrich: 1.10.0
continue_on_unexpected_error: false
output_compression: NONE
storage:
version:
rdb_loader: 0.14.0
rdb_shredder: 0.13.0
hadoop_elasticsearch: 0.1.0
monitoring:
tags: { ‘name’ : ‘EmrEtlRunner’ } # Name-value pairs describing this job
logging:
level: DEBUG # You can optionally switch to INFO for production
snowplow:
method: get
collector: <%= ENV[‘SNOWPLOW_COLLECTOR’] %> # e.g. d3rkrsqld9gmqf.cloudfront.net
app_id: ‘sp.batch’ # e.g. snowplow

But when I run
./snowplow-emr-etl-runner --config {RUNNER_CONFIG} --resolver {RUNNER_RESOLVER} --enrichments ${RUNNER_ENRICHMENTS} --skip staging

I got
F, [2017-12-08T23:15:57.229000 #23302] FATAL – :

ReturnContractError (Contract violation for return value:
Expected: {:aws=>{:access_key_id=>String, :secret_access_key=>String, :s3=>{:region=>String, :buckets=>{:assets=>String, :jsonpath_assets=>(String or nil), :log=>String, :raw=>{:in=>(a collection Array of String), :processing=>String, :archive=>String}, :enriched=>{:good=>String, :bad=>String, :errors=>(String or nil), :archive=>(String or nil)}, :shredded=>{:good=>String, :bad=>String, :errors=>(String or nil), :archive=>(String or nil)}}}, :emr=>{:ami_version=>String, :region=>String, :jobflow_role=>String, :service_role=>String, :placement=>(String or nil), :ec2_subnet_id=>(String or nil), :ec2_key_name=>String, :bootstrap=>((a collection Array of String) or nil), :software=>{:hbase=>(String or nil), :lingual=>(String or nil)}, :jobflow=>{:job_name=>String, :master_instance_type=>String, :core_instance_count=>Num, :core_instance_type=>String, :core_instance_ebs=>#<Contracts::Maybe:0x50d04b3e @vals=[{:volume_size=>#<Proc:0x6a8b3a5c@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:26 (lambda)>, :volume_type=>#<Proc:0x5aa4a4a9@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:25 (lambda)>, :volume_iops=>#<Contracts::Maybe:0x6f3681bc @vals=[#<Proc:0x6a8b3a5c@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:26 (lambda)>, nil]>, :ebs_optimized=>#<Contracts::Maybe:0x4d7dac8a @vals=[Contracts::Bool, nil]>}, nil]>, :task_instance_count=>Num, :task_instance_type=>String, :task_instance_bid=>(Num or nil)}, :additional_info=>(String or nil), :bootstrap_failure_tries=>Num}}, :collectors=>{:format=>String}, :enrich=>{:versions=>{:spark_enrich=>String}, :continue_on_unexpected_error=>Bool, :output_compression=>#<Proc:0x35cf23bc@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:24 (lambda)>}, :storage=>{:versions=>{:rdb_shredder=>String, :hadoop_elasticsearch=>String}, :download=>{:folder=>(String or nil)}}, :monitoring=>{:tags=>(Hash<Symbol, String>), :logging=>{:level=>String}, :snowplow=>({:method=>String, :collector=>String, :app_id=>String} or nil)}},
Actual: {:aws=>{:access_key_id=>“iam”, :secret_access_key=>“iam”, :s3=>{:region=>“us-west-2”, :buckets=>{:assets=>“s3://snowplow-hosted-assets”, :jsonpath_assets=>“s3://jwn-snowplow-nonprod/jsonpaths”, :log=>“s3n://jwn-snowplow-nonprod/etl/logs”, :raw=>{:in=>[“s3n://jwn-snowplow-nonprod/raw/”], :processing=>“s3://jwn-snowplow-nonprod/etl/processing”, :archive=>“s3://jwn-snowplow-nonprod/archive/raw”}, :enriched=>{:good=>“s3://jwn-snowplow-nonprod/enriched/good”, :bad=>“s3://jwn-snowplow-nonprod/enriched/bad”, :errors=>“s3://jwn-snowplow-nonprod/enriched/errors”, :archive=>“s3://jwn-snowplow-nonprod/enriched/archive”}, :shredded=>{:good=>“s3://jwn-snowplow-nonprod/shredded/good”, :bad=>“s3://jwn-snowplow-nonprod/shredded/bad”, :errors=>“s3://jwn-snowplow-nonprod/shredded/errors”, :archive=>“s3://jwn-snowplow-nonprod/shredded/archive”}}}, :emr=>{:ami_version=>“5.9.0”, :region=>“us-west-2”, :jobflow_role=>“arn:aws:iam::330838374348:instance-profile/sp-dev-emr-EMRInstanceProfile-1WK1SCGI04EKJ”, :service_role=>“arn:aws:iam::330838374348:role/sp-dev-emr-EMRServiceRole-1F0EVLTF8LOT9”, :placement=>nil, :ec2_subnet_id=>“subnet-7925180f”, :ec2_key_name=>“sp-emr-dev”, :bootstrap=>[], :software=>{:hbase=>nil, :lingual=>nil}, :jobflow=>{:job_name=>“clickstream-jwn-nonprod”, :master_instance_type=>“m4.xlarge”, :core_instance_count=>3, :core_instance_type=>“i2.2xlarge”, :task_instance_count=>0, :task_instance_type=>“m1.medium”, :task_instance_bid=>0.015}, :bootstrap_failure_tries=>3, :configuration=>{:“yarn-site”=>{:“yarn.resourcemanager.am.max-attempts”=>“1”}, :spark=>{:maximizeResourceAllocation=>“true”}}, :additional_info=>nil}}, :collectors=>{:format=>“thrift”}, :enrich=>{:versions=>{:spark_enrich=>“1.10.0”}, :continue_on_unexpected_error=>false, :output_compression=>“NONE”}, :storage=>{:version=>{:rdb_loader=>“0.14.0”, :rdb_shredder=>“0.13.0”, :hadoop_elasticsearch=>“0.1.0”}}, :monitoring=>{:tags=>{:name=>“EmrEtlRunner”}, :logging=>{:level=>“DEBUG”}, :snowplow=>{:method=>“get”, :collector=>“sp-collector-167750917.us-west-2.elb.amazonaws.com”, :app_id=>“sp.batch”}}}
Value guarded in: Snowplow::EmrEtlRunner::Cli::load_config
With Contract: Maybe, String => Hash
At: uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:137 ):
uri:classloader:/gems/contracts-0.11.0/lib/contracts.rb:45:in block in Contract' uri:classloader:/gems/contracts-0.11.0/lib/contracts.rb:154:infailure_callback’
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:80:in call_with' uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:inblock in load_config’
uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:108:in process_options' uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:94:inget_args_config_enrichments_resolver’
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in send_to' uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:incall_with’
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in block in get_args_config_enrichments_resolver' uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:37:in'
org/jruby/RubyKernel.java:973:in load' uri:classloader:/META-INF/main.rb:1:in'
org/jruby/RubyKernel.java:955:in require' uri:classloader:/META-INF/main.rb:1:in(root)‘
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `’

Those environment variables are good as they worked in previous EmrEtlRunner.
Checked actual values against expected ones: the missing ones are core_instance_ebs related that are optional. Even after adding those 3 parameters, we still got the same error.

Can any one help? Thanks.


#5

@RichardJ,

Here’s the sample of the correctly formed configuration file for R89 release. They differ from version to version. Check your actual config.yml against it and correct it accordingly.