Error while Running EmrEtlRunner


#1

Hi all,

In order to run EmrEtlRunner i am using "snowplow_emr_r89_plain_of_jars_rc1.zip"(R89 version) of storage and Runner.

I have created S3 bucket in name “unilog” under this Bucket i have created Sub-folder
"processing,archive ,good,bad,errors,archive,good,bad,errors,archive " manually in AWS managment console Of S3 bucket section .

Below is my YML Configuration file

aws:
	  # Credentials can be hardcoded or set in environment variables
	  access_key_id: XXXXXXXX
	  secret_access_key: XXXXXXXXXX
	  s3:
		region: us-east-2
		buckets:
		  assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
		  jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
		  log: s3://unilog/logs
		  raw:
			in:                  # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below
		     - s3://unilog         # e.g. s3://my-old-collector-bucket
			  - ADD HERE         # e.g. s3://my-new-collector-bucket
			processing: s3://unilog/raw/processing
			archive: s3://unilog/raw/archive   # e.g. s3://my-archive-bucket/raw
		  enriched:
			good: s3://unilog/enriched/good        # e.g. s3://my-out-bucket/enriched/good
			bad: s3://unilog/enriched/bad       # e.g. s3://my-out-bucket/enriched/bad
			errors: s3://unilog/enriched/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
			archive: s3://unilog/enriched/archive    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
		  shredded:
			good: s3://unilog/shredded/good        # e.g. s3://my-out-bucket/shredded/good
			bad: s3://unilog/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
			errors: s3://unilog/shredded/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
			archive: s3://unilog/shredded/archive     # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
	  emr:
		ami_version: 5.5.0
		region: us-east-2        # Always set this
		jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
		service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
		placement: us-east-2     # Set this if not running in VPC. Leave blank otherwise
		ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
		ec2_key_name: snowplow
		bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
		software:
		  hbase: "0.92.0"               # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
		  lingual: "1.1"              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
		# Adjust your Hadoop cluster below
		jobflow:
		  job_name: Snowplow ETL # Give your job a name
		  master_instance_type: m1.medium
		  core_instance_count: 2
		  core_instance_type: m1.medium
		  core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
			volume_size: 100    # Gigabytes
			volume_type: "gp2"
			volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
			ebs_optimized: false # Optional. Will default to true
		  task_instance_count: 0 # Increase to use spot instances
		  task_instance_type: m1.medium
		  task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
		bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
		configuration:
		  yarn-site:
			yarn.resourcemanager.am.max-attempts: "1"
		  spark:
			maximizeResourceAllocation: "true"
		additional_info:        # Optional JSON string for selecting additional features
	collectors:
	  format: thrift # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
	enrich:
	  versions:
		spark_enrich: 1.9.0 # Version of the Spark Enrichment process
	  continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
	  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
	storage:
	  versions:
		rdb_loader: 0.12.0
		rdb_shredder: 0.12.0        # Version of the Spark Shredding process
		hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
	monitoring:
	  tags: {} # Name-value pairs describing this job
	  logging:
		level: DEBUG # You can optionally switch to INFO for production
	  snowplow:
		method: get
		app_id: 150 # e.g. snowplow
		collector: xxx.xx.xx.xx:8082 # e.g. d3rkrsqld9gmqf.cloudfront.net 

Below is the iglu_resolver.json file

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
  "data": {
	"cacheSize": 500,
	"repositories": [
	  {
		"name": "Iglu Central",
		"priority": 0,
		"vendorPrefixes": [ "com.snowplowanalytics" ],
		"connection": {
		  "http": {
			"uri": "http://iglucentral.com"
		  }
		}
	  }
	]
  }
}

Using bellow command i am running EmrEtlRunner

**./snowplow-emr-etl-runner --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments snowplow/3-enrich/config/enrichments/**

bellow is the error i am getting

	ReturnContractError (Contract violation for return value:
			Expected: {:aws=>{:access_key_id=>String, :secret_access_key=>String, :s3=>{:region=>String, :buckets=>{:assets=>String, :jsonpath_assets=>(String or nil), :log=>String, :raw=>{:in=>(a collection Array of String), :processing=>String, :archive=>String}, :enriched=>{:good=>String, :bad=>String, :errors=>(String or nil), :archive=>(String or nil)}, :shredded=>{:good=>String, :bad=>String, :errors=>(String or nil), :archive=>(String or nil)}}}, :emr=>{:ami_version=>String, :region=>String, :jobflow_role=>String, :service_role=>String, :placement=>(String or nil), :ec2_subnet_id=>(String or nil), :ec2_key_name=>String, :bootstrap=>((a collection Array of String) or nil), :software=>{:hbase=>(String or nil), :lingual=>(String or nil)}, :jobflow=>{:job_name=>String, :master_instance_type=>String, :core_instance_count=>Num, :core_instance_type=>String, :core_instance_ebs=>#<Contracts::Maybe:0x30cb223b @vals=[{:volume_size=>#<Proc:0x19f1f330@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:26 (lambda)>, :volume_type=>#<Proc:0x143fe09c@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:25 (lambda)>, :volume_iops=>#<Contracts::Maybe:0x41463c56 @vals=[#<Proc:0x19f1f330@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:26 (lambda)>, nil]>, :ebs_optimized=>#<Contracts::Maybe:0x629de8 @vals=[Contracts::Bool, nil]>}, nil]>, :task_instance_count=>Num, :task_instance_type=>String, :task_instance_bid=>(Num or nil)}, :additional_info=>(String or nil), :bootstrap_failure_tries=>Num}}, :collectors=>{:format=>String}, :enrich=>{:versions=>{:spark_enrich=>String}, :continue_on_unexpected_error=>Bool, :output_compression=>#<Proc:0x72b53f27@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:24 (lambda)>}, :storage=>{:versions=>{:relational_database_shredder=>String, :hadoop_elasticsearch=>String}, :download=>{:folder=>(String or nil)}}, :monitoring=>{:tags=>(Hash<Symbol, String>), :logging=>{:level=>String}, :snowplow=>({:method=>String, :collector=>String, :app_id=>String} or nil)}},
			Actual: {:aws=>{:access_key_id=>"xxxxxxx", :secret_access_key=>"xxxxx", :s3=>{:region=>"us-east-2", :buckets=>{:assets=>"s3://snowplow-hosted-assets", :jsonpath_assets=>nil, :log=>"s3://unilog/logs", :raw=>{:in=>["s3://unilog"], :processing=>"s3://unilog/raw/processing", :archive=>"s3://unilog/raw/archive"}, :enriched=>{:good=>"s3://unilog/enriched/good", :bad=>"s3://unilog/enriched/bad", :errors=>"s3://unilog/enriched/errors", :archive=>"s3://unilog/enriched/archive"}, :shredded=>{:good=>"s3://unilog/shredded/good", :bad=>"s3://unilog/shredded/bad", :errors=>"s3://unilog/shredded/errors", :archive=>"s3://unilog/shredded/archive"}}}, :emr=>{:ami_version=>"5.5.0", :region=>"us-east-2", :jobflow_role=>"EMR_EC2_DefaultRole", :service_role=>"EMR_DefaultRole", :placement=>"us-east-2", :ec2_subnet_id=>nil, :ec2_key_name=>"snowplow", :bootstrap=>[], :software=>{:hbase=>"0.92.0", :lingual=>"1.1"}, :jobflow=>{:job_name=>"Snowplow ETL", :master_instance_type=>"m1.medium", :core_instance_count=>2, :core_instance_type=>"m1.medium", :core_instance_ebs=>{:volume_size=>100, :volume_type=>"gp2", :volume_iops=>400, :ebs_optimized=>false}, :task_instance_count=>0, :task_instance_type=>"m1.medium", :task_instance_bid=>0.015}, :bootstrap_failure_tries=>3, :configuration=>{:"yarn-site"=>{:"yarn.resourcemanager.am.max-attempts"=>"1"}, :spark=>{:maximizeResourceAllocation=>"true"}}, :additional_info=>nil}}, :collectors=>{:format=>"thrift"}, :enrich=>{:versions=>{:spark_enrich=>"1.9.0"}, :continue_on_unexpected_error=>false, :output_compression=>"NONE"}, :storage=>{:versions=>{:rdb_loader=>"0.12.0", :rdb_shredder=>"0.12.0", :hadoop_elasticsearch=>"0.1.0"}}, :monitoring=>{:tags=>{}, :logging=>{:level=>"DEBUG"}, :snowplow=>{:method=>"get", :app_id=>150, :collector=>"xxx.xx.xx.x:8082"}}}
			Value guarded in: Snowplow::EmrEtlRunner::Cli::load_config
			With Contract: Maybe, String => Hash
			At: uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:137 ):
		uri:classloader:/gems/contracts-0.11.0/lib/contracts.rb:45:in `block in Contract'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts.rb:154:in `failure_callback'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:80:in `call_with'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in load_config'
		uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:108:in `process_options'
		uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:94:in `get_args_config_enrichments_resolver'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
		uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in get_args_config_enrichments_resolver'
		uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:37:in `<main>'
		org/jruby/RubyKernel.java:973:in `load'
		uri:classloader:/META-INF/main.rb:1:in `<main>'
		org/jruby/RubyKernel.java:955:in `require'
		uri:classloader:/META-INF/main.rb:1:in `(root)'
		uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

i have print and check Expected and actual it was correct

please help me out !

If their any configuration I am missing?


#2

Your config file looks newer than the version of Snowplow you are trying to run (R89-RC1) and has a few parameters (like rdb_loader) that weren’t released until R90. I’d recommend testing your configuration file with the latest version of Snowplow (R92) first and then looking at what errors come from that.


#3

Hi Mike,

I change the version snowplow_emr_r92_maiden_castle.zip and ran.
When i run below command.

./snowplow-emr-etl-runner --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments snowplow/3-enrich/config/enrichments/

I am getting error like

invalid option: --config

Whats wrong in my approach?


#4

Hi @sandesh, in R91 the functionality for running Snowplow on EMR moved into a run command, like so:

./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml

#5

Hi @alex, Thanks for the quick response.
After running with below command

./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml

i am getting same error as before.

F, [2017-09-18T07:00:07.031000 #8220] FATAL -- :

ReturnContractError (Contract violation for return value:
		Expected: #<Contracts::Maybe:0x5b1f0f26 @vals=[{:aws=>{:access_key_id=>String, :secret_access_key=>String, :s3=>{:region=>String, :buckets=>{:assets=>String, :jsonpath_assets=>#<Contracts::Maybe:0x3dc2f14 @vals=[String, nil]>, :log=>String, :raw=>{:in=>#<Contracts::CollectionOf:0x3aca2579 @contract=String, @collection_class=Array>, :processing=>String, :archive=>String}, :enriched=>{:good=>String, :bad=>String, :errors=>#<Contracts::Maybe:0x878feb2 @vals=[String, nil]>, :archive=>#<Contracts::Maybe:0x1818390b @vals=[String, nil]>}, :shredded=>{:good=>String, :bad=>String, :errors=>#<Contracts::Maybe:0x31b650e9 @vals=[String, nil]>, :archive=>#<Contracts::Maybe:0x683fe7b5 @vals=[String, nil]>}}}, :emr=>{:ami_version=>String, :region=>String, :jobflow_role=>String, :service_role=>String, :placement=>#<Contracts::Maybe:0x1bea7b0 @vals=[String, nil]>, :ec2_subnet_id=>#<Contracts::Maybe:0x2d4a0671 @vals=[String, nil]>, :ec2_key_name=>String, :bootstrap=>#<Contracts::Maybe:0x464a3430 @vals=[#<Contracts::CollectionOf:0x7bd7d71c @contract=String, @collection_class=Array>, nil]>, :software=>{:hbase=>#<Contracts::Maybe:0x65b2ee36 @vals=[String, nil]>, :lingual=>#<Contracts::Maybe:0x16d4024e @vals=[String, nil]>}, :jobflow=>{:job_name=>String, :master_instance_type=>String, :core_instance_count=>Contracts::Num, :core_instance_type=>String, :core_instance_ebs=>#<Contracts::Maybe:0x6807989e @vals=[{:volume_size=>#<Proc:0x1f013047@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:28 (lambda)>, :volume_type=>#<Proc:0x16361e61@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:27 (lambda)>, :volume_iops=>#<Contracts::Maybe:0x51566ce0 @vals=[#<Proc:0x1f013047@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:28 (lambda)>, nil]>, :ebs_optimized=>#<Contracts::Maybe:0x17e6d07b @vals=[Contracts::Bool, nil]>}, nil]>, :task_instance_count=>Contracts::Num, :task_instance_type=>String, :task_instance_bid=>#<Contracts::Maybe:0x37e491e2 @vals=[Contracts::Num, nil]>}, :additional_info=>#<Contracts::Maybe:0x26679788 @vals=[String, nil]>, :bootstrap_failure_tries=>Contracts::Num, :configuration=>#<Contracts::Maybe:0x5bad555b @vals=[#<Contracts::HashOf:0x70b1028d @key=Symbol, @value=#<Contracts::HashOf:0x11d422fd @key=Symbol, @value=String>>, nil]>}}, :collectors=>{:format=>String}, :enrich=>{:versions=>{:spark_enrich=>String}, :continue_on_unexpected_error=>Contracts::Bool, :output_compression=>#<Proc:0x6e489bb8@uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/contracts.rb:26 (lambda)>}, :storage=>{:versions=>{:rdb_shredder=>String, :hadoop_elasticsearch=>String, :rdb_loader=>String}}, :monitoring=>{:tags=>#<Contracts::HashOf:0x6ef4297d @key=Symbol, @value=String>, :logging=>{:level=>String}, :snowplow=>#<Contracts::Maybe:0x460e5ffe @vals=[{:method=>String, :collector=>String, :app_id=>String}, nil]>}}, nil]>,
		Actual: {:aws=>{:access_key_id=>"xxxxxxxxx", :secret_access_key=>"xxxxxxx", :s3=>{:region=>"us-east-2", :buckets=>{:assets=>"s3://snowplow-hosted-assets", :jsonpath_assets=>nil, :log=>"s3://unilog/logs", :raw=>{:in=>["s3://unilog"], :processing=>"s3://unilog/raw/processing", :archive=>"s3://unilog/raw/archive"}, :enriched=>{:good=>"s3://unilog/enriched/good", :bad=>"s3://unilog/enriched/bad", :errors=>"s3://unilog/enriched/errors", :archive=>"s3://unilog/enriched/archive"}, :shredded=>{:good=>"s3://unilog/shredded/good", :bad=>"s3://unilog/shredded/bad", :errors=>"s3://unilog/shredded/errors", :archive=>"s3://unilog/shredded/archive"}}}, :emr=>{:ami_version=>"5.5.0", :region=>"us-east-2", :jobflow_role=>"EMR_EC2_DefaultRole", :service_role=>"EMR_DefaultRole", :placement=>"us-east-2", :ec2_subnet_id=>nil, :ec2_key_name=>"snowplow", :bootstrap=>[], :software=>{:hbase=>"0.92.0", :lingual=>"1.1"}, :jobflow=>{:job_name=>"Snowplow ETL", :master_instance_type=>"m1.medium", :core_instance_count=>2, :core_instance_type=>"m1.medium", :core_instance_ebs=>{:volume_size=>100, :volume_type=>"gp2", :volume_iops=>400, :ebs_optimized=>false}, :task_instance_count=>0, :task_instance_type=>"m1.medium", :task_instance_bid=>0.015}, :bootstrap_failure_tries=>3, :configuration=>{:"yarn-site"=>{:"yarn.resourcemanager.am.max-attempts"=>"1"}, :spark=>{:maximizeResourceAllocation=>"true"}}, :additional_info=>nil}}, :collectors=>{:format=>"thrift"}, :enrich=>{:versions=>{:spark_enrich=>"1.9.0"}, :continue_on_unexpected_error=>false, :output_compression=>"NONE"}, :storage=>{:versions=>{:rdb_loader=>"0.12.0", :rdb_shredder=>"0.12.0", :hadoop_elasticsearch=>"0.1.0"}}, :monitoring=>{:tags=>{}, :logging=>{:level=>"DEBUG"}, :snowplow=>{:method=>"get", :app_id=>150, :collector=>"172.31.38.39:8082"}}}
		Value guarded in: Snowplow::EmrEtlRunner::Cli::load_config
		With Contract: Maybe, String, Bool => Maybe
		At: uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:202 ):
	uri:classloader:/gems/contracts-0.11.0/lib/contracts.rb:45:in `block in Contract'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts.rb:154:in `failure_callback'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:80:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:191:in `process_options'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:167:in `get_args_config_enrichments_resolver'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:37:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'
	org/jruby/RubyKernel.java:961:in `require'
	uri:classloader:/META-INF/main.rb:1:in `(root)'
	uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

Please help me out,
I have been stuck from 3 days.


#6

Hi @alex/@mike, Below is the architecture i am following

-> Stream enrich -> Kinesis S3 -> S3 -> EmrEtlRunner (shredding) -> PostgreSQL

I need to complete the overall process as soon as possible. Until S3 section i have completed and i am stuck in EmrEtlRunner(Shredding) section.


#7

app_id (under monitoring) should be a string. You’ll also want to redact your access key and secret key from the post above.


#8

Thanks mike,

Now the error got changed.

D, [2017-09-18T07:50:56.496000 #8618] DEBUG -- : Initializing EMR jobflow
F, [2017-09-18T07:51:02.015000 #8618] FATAL -- :

ArgumentError (AWS EMR API Error (ValidationException): Specified Availability Zone is not supported.):
	uri:classloader:/gems/elasticity-6.0.12/lib/elasticity/aws_session.rb:33:in `submit'
	uri:classloader:/gems/elasticity-6.0.12/lib/elasticity/emr.rb:302:in `run_job_flow'
	uri:classloader:/gems/elasticity-6.0.12/lib/elasticity/job_flow.rb:173:in `run'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:542:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:92:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'
	org/jruby/RubyKernel.java:961:in `require'
	uri:classloader:/META-INF/main.rb:1:in `(root)'
	uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

#9

Hey @mike
For S3 region i have set it for : us-east-2
I have added below steps (as Env variable)

$ export AWS_ACCESS_KEY_ID=xxxxxxxxxx

$ export AWS_SECRET_ACCESS_KEY=xxxxxxxxxx

$ export AWS_DEFAULT_REGION=us-east-2

Please tell me what should be S3 region and AWS_default_Region ?
in order to getover from this error…


#10

Hey mike,
I have Updated S3 region to us-east-2a and the error got changed.

below is the error

Excon::Error::Socket (getaddrinfo: name or service not known (SocketError)):
	org/jruby/ext/socket/RubySocket.java:344:in `getaddrinfo'
	uri:classloader:/gems/excon-0.52.0/lib/excon/socket.rb:101:in `connect'
	uri:classloader:/gems/excon-0.52.0/lib/excon/ssl_socket.rb:147:in `connect'
	uri:classloader:/gems/excon-0.52.0/lib/excon/socket.rb:29:in `initialize'
	uri:classloader:/gems/excon-0.52.0/lib/excon/ssl_socket.rb:9:in `initialize'
	uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:403:in `socket'
	uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:100:in `request_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/mock.rb:48:in `request_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/instrumentor.rb:26:in `request_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:16:in `request_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:16:in `request_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:16:in `request_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:249:in `request'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/idempotent.rb:27:in `error_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:272:in `request'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/idempotent.rb:27:in `error_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:272:in `request'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/idempotent.rb:27:in `error_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
	uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:272:in `request'
	uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/sax_parser_connection.rb:35:in `request'
	uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/connection.rb:7:in `request'
	uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/storage.rb:612:in `_request'
	uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/storage.rb:607:in `request'
	uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/requests/storage/get_service.rb:21:in `get_service'
	uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/requests/storage/sync_clock.rb:9:in `sync_clock'
	uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:53:in `new_fog_s3_from'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:79:in `initialize'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:89:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'
	org/jruby/RubyKernel.java:961:in `require'
	uri:classloader:/META-INF/main.rb:1:in `(root)'
	uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

i didnt get this error.


#11

Hi @mike/@alex, while running below command for EmrEtlRunner i am getting below error.

D, [2017-09-19T08:14:31.008000 #18465] DEBUG -- : Initializing EMR jobflow
D, [2017-09-19T08:14:35.175000 #18465] DEBUG -- : EMR jobflow j-2NS0GFDL3I303 started, waiting for jobflow to complete...
I, [2017-09-19T08:14:35.184000 #18465]  INFO -- : SnowplowTracker::Emitter initialized with endpoint http://172.31.38.39:8082:80/i
I, [2017-09-19T08:14:35.500000 #18465]  INFO -- : Attempting to send 1 request
W, [2017-09-19T08:14:35.507000 #18465]  WARN -- : bad URI(is not URI?): http://172.31.38.39:8082:80/i?e=ue&ue_px=eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy91bnN0cnVjdF9ldmVudC9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5tb25pdG9yaW5nLmJhdGNoL2Vtcl9qb2Jfc3RhcnRlZC9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6e319fQ%3D%3D&cx=eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy9jb250ZXh0cy9qc29uc2NoZW1hLzEtMC0xIiwiZGF0YSI6W3sic2NoZW1hIjoiaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3MubW9uaXRvcmluZy5iYXRjaC9hcHBsaWNhdGlvbl9jb250ZXh0L2pzb25zY2hlbWEvMS0wLTAiLCJkYXRhIjp7Im5hbWUiOiJzbm93cGxvdy1lbXItZXRsLXJ1bm5lciIsInZlcnNpb24iOiIwLjI3LjAtcmM5IiwidGFncyI6e30sImxvZ0xldmVsIjoiREVCVUcifX0seyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5tb25pdG9yaW5nLmJhdGNoL2Vtcl9qb2Jfc3RhdHVzL2pzb25zY2hlbWEvMS0wLTAiLCJkYXRhIjp7Im5hbWUiOiJTbm93cGxvdyBFVEwiLCJqb2JmbG93X2lkIjoiai0yTlMwR0ZETDNJMzAzIiwic3RhdGUiOiJTVEFSVElORyIsImNyZWF0ZWRfYXQiOiIyMDE3LTA5LTE5VDA4OjE0OjM0WiIsImVuZGVkX2F0IjpudWxsLCJsYXN0X3N0YXRlX2NoYW5nZV9yZWFzb24iOm51bGx9fSx7InNjaGVtYSI6ImlnbHU6Y29tLnNub3dwbG93YW5hbHl0aWNzLm1vbml0b3JpbmcuYmF0Y2gvam9iZmxvd19zdGVwX3N0YXR1cy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiRWxhc3RpY2l0eSBTM0Rpc3RDcCBTdGVwOiBTaHJlZGRlZCBTMyAtPiBTaHJlZGRlZCBBcmNoaXZlIFMzIiwic3RhdGUiOiJQRU5ESU5HIiwiY3JlYXRlZF9hdCI6IjIwMTctMDktMTlUMDg6MTQ6MzRaIiwic3RhcnRlZF9hdCI6bnVsbCwiZW5kZWRfYXQiOm51bGx9fSx7InNjaGVtYSI6ImlnbHU6Y29tLnNub3dwbG93YW5hbHl0aWNzLm1vbml0b3JpbmcuYmF0Y2gvam9iZmxvd19zdGVwX3N0YXR1cy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiRWxhc3RpY2l0eSBTM0Rpc3RDcCBTdGVwOiBFbnJpY2hlZCBTMyAtPiBFbnJpY2hlZCBBcmNoaXZlIFMzIiwic3RhdGUiOiJQRU5ESU5HIiwiY3JlYXRlZF9hdCI6IjIwMTctMDktMTlUMDg6MTQ6MzRaIiwic3RhcnRlZF9hdCI6bnVsbCwiZW5kZWRfYXQiOm51bGx9fSx7InNjaGVtYSI6ImlnbHU6Y29tLnNub3dwbG93YW5hbHl0aWNzLm1vbml0b3JpbmcuYmF0Y2gvam9iZmxvd19zdGVwX3N0YXR1cy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiRWxhc3RpY2l0eSBTM0Rpc3RDcCBTdGVwOiBSYXcgU3RhZ2luZyBTMyAtPiBSYXcgQXJjaGl2ZSBTMyIsInN0YXRlIjoiUEVORElORyIsImNyZWF0ZWRfYXQiOiIyMDE3LTA5LTE5VDA4OjE0OjM0WiIsInN0YXJ0ZWRfYXQiOm51bGwsImVuZGVkX2F0IjpudWxsfX0seyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5tb25pdG9yaW5nLmJhdGNoL2pvYmZsb3dfc3RlcF9zdGF0dXMvanNvbnNjaGVtYS8xLTAtMCIsImRhdGEiOnsibmFtZSI6IkVsYXN0aWNpdHkgUzNEaXN0Q3AgU3RlcDogU2hyZWRkZWQgSERGUyBfU1VDQ0VTUyAtPiBTMyIsInN0YXRlIjoiUEVORElORyIsImNyZWF0ZWRfYXQiOiIyMDE3LTA5LTE5VDA4OjE0OjM0WiIsInN0YXJ0ZWRfYXQiOm51bGwsImVuZGVkX2F0IjpudWxsfX0seyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5tb25pdG9yaW5nLmJhdGNoL2pvYmZsb3dfc3RlcF9zdGF0dXMvanNvbnNjaGVtYS8xLTAtMCIsImRhdGEiOnsibmFtZSI6IkVsYXN0aWNpdHkgUzNEaXN0Q3AgU3RlcDogU2hyZWRkZWQgSERGUyAtPiBTMyIsInN0YXRlIjoiUEVORElORyIsImNyZWF0ZWRfYXQiOiIyMDE3LTA5LTE5VDA4OjE0OjM0WiIsInN0YXJ0ZWRfYXQiOm51bGwsImVuZGVkX2F0IjpudWxsfX0seyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5tb25pdG9yaW5nLmJhdGNoL2pvYmZsb3dfc3RlcF9zdGF0dXMvanNvbnNjaGVtYS8xLTAtMCIsImRhdGEiOnsibmFtZSI6IkVsYXN0aWNpdHkgU3BhcmsgU3RlcDogU2hyZWQgRW5yaWNoZWQgRXZlbnRzIiwic3RhdGUiOiJQRU5ESU5HIiwiY3JlYXRlZF9hdCI6IjIwMTctMDktMTlUMDg6MTQ6MzRaIiwic3RhcnRlZF9hdCI6bnVsbCwiZW5kZWRfYXQiOm51bGx9fSx7InNjaGVtYSI6ImlnbHU6Y29tLnNub3dwbG93YW5hbHl0aWNzLm1vbml0b3JpbmcuYmF0Y2gvam9iZmxvd19zdGVwX3N0YXR1cy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiRWxhc3RpY2l0eSBDdXN0b20gSmFyIFN0ZXA6IEVtcHR5IFJhdyBIREZTIiwic3RhdGUiOiJQRU5ESU5HIiwiY3JlYXRlZF9hdCI6IjIwMTctMDktMTlUMDg6MTQ6MzRaIiwic3RhcnRlZF9hdCI6bnVsbCwiZW5kZWRfYXQiOm51bGx9fSx7InNjaGVtYSI6ImlnbHU6Y29tLnNub3dwbG93YW5hbHl0aWNzLm1vbml0b3JpbmcuYmF0Y2gvam9iZmxvd19zdGVwX3N0YXR1cy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiRWxhc3RpY2l0eSBTM0Rpc3RDcCBTdGVwOiBFbnJpY2hlZCBIREZTIF9TVUNDRVNTIC0%2BIFMzIiwic3RhdGUiOiJQRU5ESU5HIiwiY3JlYXRlZF9hdCI6IjIwMTctMDktMTlUMDg6MTQ6MzRaIiwic3RhcnRlZF9hdCI6bnVsbCwiZW5kZWRfYXQiOm51bGx9fSx7InNjaGVtYSI6ImlnbHU6Y29tLnNub3dwbG93YW5hbHl0aWNzLm1vbml0b3JpbmcuYmF0Y2gvam9iZmxvd19zdGVwX3N0YXR1cy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiRWxhc3RpY2l0eSBTM0Rpc3RDcCBTdGVwOiBFbnJpY2hlZCBIREZTIC0%2BIFMzIiwic3RhdGUiOiJQRU5ESU5HIiwiY3JlYXRlZF9hdCI6IjIwMTctMDktMTlUMDg6MTQ6MzRaIiwic3RhcnRlZF9hdCI6bnVsbCwiZW5kZWRfYXQiOm51bGx9fSx7InNjaGVtYSI6ImlnbHU6Y29tLnNub3dwbG93YW5hbHl0aWNzLm1vbml0b3JpbmcuYmF0Y2gvam9iZmxvd19zdGVwX3N0YXR1cy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiRWxhc3RpY2l0eSBTcGFyayBTdGVwOiBFbnJpY2ggUmF3IEV2ZW50cyIsInN0YXRlIjoiUEVORElORyIsImNyZWF0ZWRfYXQiOiIyMDE3LTA5LTE5VDA4OjE0OjM0WiIsInN0YXJ0ZWRfYXQiOm51bGwsImVuZGVkX2F0IjpudWxsfX0seyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5tb25pdG9yaW5nLmJhdGNoL2pvYmZsb3dfc3RlcF9zdGF0dXMvanNvbnNjaGVtYS8xLTAtMCIsImRhdGEiOnsibmFtZSI6IkVsYXN0aWNpdHkgUzNEaXN0Q3AgU3RlcDogUmF3IFMzIC0%2BIFJhdyBIREZTIiwic3RhdGUiOiJQRU5ESU5HIiwiY3JlYXRlZF9hdCI6IjIwMTctMDktMTlUMDg6MTQ6MzRaIiwic3RhcnRlZF9hdCI6bnVsbCwiZW5kZWRfYXQiOm51bGx9fSx7InNjaGVtYSI6ImlnbHU6Y29tLnNub3dwbG93YW5hbHl0aWNzLm1vbml0b3JpbmcuYmF0Y2gvam9iZmxvd19zdGVwX3N0YXR1cy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYW1lIjoiU3RhcnQgSEJhc2UgMC45Mi4wIiwic3RhdGUiOiJQRU5ESU5HIiwiY3JlYXRlZF9hdCI6IjIwMTctMDktMTlUMDg6MTQ6MzRaIiwic3RhcnRlZF9hdCI6bnVsbCwiZW5kZWRfYXQiOm51bGx9fV19&dtm=1505808875497&p=srv&tv=rb-0.5.2&aid=unilog&eid=d3ed4e04-8a2a-4e6b-b0ea-6b7b7a9c2b92 (URI::InvalidURIError)
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/uri/rfc3986_parser.rb:67:in `split'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/uri/rfc3986_parser.rb:72:in `parse'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/uri/common.rb:227:in `parse'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/uri/common.rb:714:in `URI'
uri:classloader:/gems/snowplow-tracker-0.5.2/lib/snowplow-tracker/emitters.rb:172:in `http_get'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/gems/snowplow-tracker-0.5.2/lib/snowplow-tracker/emitters.rb:143:in `block in send_requests'
org/jruby/RubyArray.java:1733:in `each'
uri:classloader:/gems/snowplow-tracker-0.5.2/lib/snowplow-tracker/emitters.rb:140:in `send_requests'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/gems/snowplow-tracker-0.5.2/lib/snowplow-tracker/emitters.rb:100:in `block in flush'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/monitor.rb:214:in `mon_synchronize'
uri:classloader:/gems/snowplow-tracker-0.5.2/lib/snowplow-tracker/emitters.rb:99:in `flush'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/gems/snowplow-tracker-0.5.2/lib/snowplow-tracker/emitters.rb:88:in `block in input'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/monitor.rb:214:in `mon_synchronize'
uri:classloader:/gems/snowplow-tracker-0.5.2/lib/snowplow-tracker/emitters.rb:85:in `input'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/gems/snowplow-tracker-0.5.2/lib/snowplow-tracker/tracker.rb:131:in `block in track'
org/jruby/RubyArray.java:1733:in `each'
uri:classloader:/gems/snowplow-tracker-0.5.2/lib/snowplow-tracker/tracker.rb:131:in `track'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/gems/snowplow-tracker-0.5.2/lib/snowplow-tracker/tracker.rb:281:in `track_unstruct_event'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/monitoring/snowplow.rb:134:in `track_job_started'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:547:in `run'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:92:in `run'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
org/jruby/RubyKernel.java:979:in `load'
uri:classloader:/META-INF/main.rb:1:in `<main>'
org/jruby/RubyKernel.java:961:in `require'
uri:classloader:/META-INF/main.rb:1:in `(root)'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

Please help me out.


#12

The problem is the double port attached to your Snowplow collector here:

http://172.31.38.39:8082:80

You cannot override the port 80 for tracking at this time - there is a ticket to fix this, here:

https://github.com/snowplow/snowplow/issues/3236


#13

Hi @alex, I changed the collector port to 80 and ran the collector but i am getting below error.

07:05:44.558 [main] INFO  c.s.s.c.s.sinks.KinesisSink - Creating thread pool of size 10
07:05:45.883 [main] INFO  c.s.s.c.s.sinks.KinesisSink - Stream collectorGoodRegion1 exists and is active
07:05:45.883 [main] INFO  c.s.s.c.s.sinks.KinesisSink - Creating thread pool of size 10
07:05:46.010 [main] INFO  c.s.s.c.s.sinks.KinesisSink - Stream CollectorBadRegion exists and is active
07:05:46.683 [ForkJoinPool-2-worker-1] ERROR c.s.s.c.scalastream.ScalaCollector$ - Failure binding to port
java.lang.RuntimeException: CommandFailed(Bind(Actor[akka://scala-stream-collector/user/handler#1866228152],/172.31.38.39:80,100,List(),None))
		at com.snowplowanalytics.snowplow.collectors.scalastream.ScalaCollector$$anonfun$3.apply(ScalaCollectorApp.scala:118) ~[snowplow-stream-collector-0.9.0:0.9.0]
		at com.snowplowanalytics.snowplow.collectors.scalastream.ScalaCollector$$anonfun$3.apply(ScalaCollectorApp.scala:116) ~[snowplow-stream-collector-0.9.0:0.9.0]
		at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:251) ~[snowplow-stream-collector-0.9.0:0.9.0]
		at scala.concurrent.Future$$anonfun$flatMap$1.apply(Future.scala:249) ~[snowplow-stream-collector-0.9.0:0.9.0]
		at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) ~[snowplow-stream-collector-0.9.0:0.9.0]
		at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) ~[snowplow-stream-collector-0.9.0:0.9.0]
		at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) ~[snowplow-stream-collector-0.9.0:0.9.0]
		at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) ~[snowplow-stream-collector-0.9.0:0.9.0]
		at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [snowplow-stream-collector-0.9.0:0.9.0]
		at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [snowplow-stream-collector-0.9.0:0.9.0]

I checked the port “80” whether it is listening to some other process with( , collector is successfully running for port “8082” only.


#14

Then disable the Snowplow tracking in your EmrEtlRunner.


#15

Hi @alex, I change my EmrEtlRunner.yml file as below.

aws:
  # Credentials can be hardcoded or set in environment variables
  access_key_id: XXXXXXXXXXX
  secret_access_key: XXXXXXXXXX
  s3:
	region: us-east-1
	buckets:
	  assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
	  jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
	  log: s3://unilogregion1/logs
	  raw:
		in:                  # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below
		  - s3://unilogregion1         # e.g. s3://my-old-collector-bucket
		processing: s3://unilogregion1/raw/processing
		archive: s3://unilogregion1/raw/archive   # e.g. s3://my-archive-bucket/raw
	  enriched:
		good: s3://unilogregion1/enriched/good        # e.g. s3://my-out-bucket/enriched/good
		bad: s3://unilogregion1/enriched/bad       # e.g. s3://my-out-bucket/enriched/bad
		errors: s3://unilogregion1/enriched/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
		archive: s3://unilogregion1/enriched/archive    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
	  shredded:
		good: s3://unilogregion1/shredded/good        # e.g. s3://my-out-bucket/shredded/good
		bad: s3://unilogregion1/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
		errors: s3://unilogregion1/shredded/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
		archive: s3://unilogregion1/shredded/archive     # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
	ami_version: 5.5.0
	region: us-east-1a       # Always set this
	jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
	service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
	placement: us-east-1a     # Set this if not running in VPC. Leave blank otherwise
	ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
	ec2_key_name: snowplow
	bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
	software:
	  hbase: "0.92.0"               # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
	  lingual: "1.1"              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
	# Adjust your Hadoop cluster below
	jobflow:
	  job_name: Snowplow ETL # Give your job a name
	  master_instance_type: m1.medium
	  core_instance_count: 2
	  core_instance_type: m1.medium
	  core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
		volume_size: 100    # Gigabytes
		volume_type: "gp2"
		volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
		ebs_optimized: false # Optional. Will default to true
	  task_instance_count: 0 # Increase to use spot instances
	  task_instance_type: m1.medium
	  task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
	bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
	configuration:
	  yarn-site:
		yarn.resourcemanager.am.max-attempts: "1"
	  spark:
		maximizeResourceAllocation: "true"
	additional_info:        # Optional JSON string for selecting additional features
collectors:
  format: thrift # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
enrich:
  versions:
	spark_enrich: 1.9.0 # Version of the Spark Enrichment process
  continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
  versions:
	rdb_loader: 0.12.0
	rdb_shredder: 0.12.0        # Version of the Spark Shredding process
	hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
monitoring:
  tags: {} # Name-value pairs describing this job
  logging:
	level: DEBUG # You can optionally switch to INFO for production
  #snowplow:
	#method: get
	#app_id: unilog # e.g. snowplow
	#collector: 172.31.38.39:8082 # e.g. d3rkrsqld9gmqf.cloudfront.net

After Running EmrEtlRunner with below command.

./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments snowplow/3-enrich/config/enrichments/ --skip staging

I am getting following error.

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-2D3S0822MEV2O failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATED_WITH_ERRORS [VALIDATION_ERROR] ~ elapsed time n/a [ - 2017-09-20 09:54:05 +0000]
 - 1. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 2. Elasticity S3DistCp Step: Enriched S3 -> Enriched Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 3. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 4. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 5. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 6. Elasticity Spark Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
 - 7. Elasticity Custom Jar Step: Empty Raw HDFS: CANCELLED ~ elapsed time n/a [ - ]
 - 8. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 9. Elasticity S3DistCp Step: Enriched HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 10. Elasticity Spark Step: Enrich Raw Events: CANCELLED ~ elapsed time n/a [ - ]
 - 11. Elasticity S3DistCp Step: Raw S3 -> Raw HDFS: CANCELLED ~ elapsed time n/a [ - ]
 - 12. Elasticity S3DistCp Step: Raw s3://unilogregion1/ -> Raw Staging S3: CANCELLED ~ elapsed time n/a [ - ]
 - 13. Start HBase 0.92.0: CANCELLED ~ elapsed time n/a [ - ]):
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:572:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:92:in `run'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
	uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
	uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
	org/jruby/RubyKernel.java:979:in `load'
	uri:classloader:/META-INF/main.rb:1:in `<main>'
	org/jruby/RubyKernel.java:961:in `require'
	uri:classloader:/META-INF/main.rb:1:in `(root)'
	uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

I am in last stage of processing please help me out.


#16

Hey @alex/@mike please help me out… i am stuck in final stage.

Correct me if i am doing anything wrong.


#17

What’s in your error logs? The first line of your output above is

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-2D3S0822MEV2O failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.


#18

Thanks for the reply @mike.
Below is the EMR cluster screenshot.


In AWS CLI console i am getting below things

ubuntu@ip-172-31-38-39:~$ ./snowplow-emr-etl-runner run --config snowplow/4-storage/config/emretlrunner.yml --resolver snowplow/3-enrich/config/iglu_resolver.json --enrichments snowplow/3-enrich/config/enrichments/ --skip staging

D, [2017-09-21T14:04:54.431000 #12070] DEBUG -- : Initializing EMR jobflow
D, [2017-09-21T14:04:58.510000 #12070] DEBUG -- : EMR jobflow j-XC1NQI6YIQLL started, waiting for jobflow to complete...
W, [2017-09-21T14:09:00.075000 #12070]  WARN -- : Job failed. 2 tries left...
W, [2017-09-21T14:09:00.076000 #12070]  WARN -- : Bootstrap failure detected, retrying in 539 seconds...
    D, [2017-09-21T14:17:59.079000 #12070] DEBUG -- : Initializing EMR jobflow
D, [2017-09-21T14:18:01.205000 #12070] DEBUG -- : EMR jobflow j-B0TYZ01TIWEB started, waiting for jobflow to complete...
W, [2017-09-21T14:22:02.765000 #12070]  WARN -- : Job failed. 1 tries left...
W, [2017-09-21T14:22:02.765000 #12070]  WARN -- : Bootstrap failure detected, retrying in 202 seconds...

please help me to resolve this error.

Below is my emretlrunner.yml file

 aws:
  # Credentials can be hardcoded or set in environment variables
  access_key_id: XXXXXXX
  secret_access_key: XXXXXXXXXXX
  keypair: Snowplowkeypair
  key-pair-file: /home/ubuntu/snowplow/4-storage/config/Snowplowkeypair.pem
  region: us-east-1
  s3:
	region: us-east-1
	buckets:
	  assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
	  jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
	  log: s3://unilogregion1/logs
	  raw:
		in:                  # This is a YAML array of one or more in buckets - you MUST use hyphens before each entry in the array, as below
		  - s3://unilogregion1         # e.g. s3://my-old-collector-bucket
		processing: s3://unilogregion1/raw/processing
		archive: s3://unilogregion1/raw/archive   # e.g. s3://my-archive-bucket/raw
	  enriched:
		good: s3://unilogregion1/enriched/good        # e.g. s3://my-out-bucket/enriched/good
		bad: s3://unilogregion1/enriched/bad       # e.g. s3://my-out-bucket/enriched/bad
		errors: s3://unilogregion1/enriched/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
		archive: s3://unilogregion1/enriched/archive    # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
	  shredded:
		good: s3://unilogregion1/shredded/good        # e.g. s3://my-out-bucket/shredded/good
		bad: s3://unilogregion1/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
		errors: s3://unilogregion1/shredded/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
		archive: s3://unilogregion1/shredded/archive     # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
	ami_version: 5.5.0
	region: us-east-1a       # Always set this
	jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
	service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
	placement: us-east-1a      # Set this if not running in VPC. Leave blank otherwise
	ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
	ec2_key_name: Snowplowkeypair
	bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
	software:
	  hbase: "0.92.0"               # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
	  lingual: "1.1"              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
	# Adjust your Hadoop cluster below
	jobflow:
	  job_name: Snowplow ETL # Give your job a name
	  master_instance_type: m1.medium
	  core_instance_count: 2
	  core_instance_type: m1.medium
	  core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
		volume_size: 100    # Gigabytes
		volume_type: "gp2"
		volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
		ebs_optimized: false # Optional. Will default to true
	  task_instance_count: 0 # Increase to use spot instances
	  task_instance_type: m1.medium
	  task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
	bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
	configuration:
	  yarn-site:
		yarn.resourcemanager.am.max-attempts: "1"
	  spark:
		maximizeResourceAllocation: "true"
	additional_info:        # Optional JSON string for selecting additional features
collectors:
  format: thrift # For example: 'clj-tomcat' for the Clojure Collector, 'thrift' for Thrift records, 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs or 'ndjson/urbanairship.connect/v1' for UrbanAirship Connect events
enrich:
  versions:
	spark_enrich: 1.9.0 # Version of the Spark Enrichment process
  continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
  output_compression: NONE # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
  versions:
	rdb_loader: 0.12.0
	rdb_shredder: 0.12.0        # Version of the Spark Shredding process
	hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
monitoring:
  tags: {} # Name-value pairs describing this job
  logging:
	level: DEBUG # You can optionally switch to INFO for production
  #snowplow:
	#method: get
	#app_id: unilog # e.g. snowplow
	#collector: 172.31.38.39:8082 # e.g. d3rkrsqld9gmqf.cloudfront.net

Please guide in the correct path.
Thanks,
sandesh P


#19

Hey @alex/@mike
I solved this error by adding zone to placement section and leaving blank of software section.
i,e placement: us-east-1a
software:
hbase:
lingual:


#20

Thanks for letting us know @sandesh!