Failed to start EmrEtlRunner


#1

Hi Guys,

I have been checking most of the posts online and in this forum, but still can’t get the snowplow-emr-etl-runner to work.

The command I used is:
./snowplow-emr-etl-runner --debug --config snwplw-config.yml --resolver iglu_resolver.json

Keep getting this error:

Value guarded in: Snowplow::EmrEtlRunner::Cli::load_config
With Contract: Maybe, String => Hash
At: /home/user/loader/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/cli.rb:134 ):
/home/user/loader/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:69:in Contract' org/jruby/RubyProc.java:271:incall’
/home/user/loader/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:147:in failure_callback' /home/user/loader/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:164:incommon_method_added’
/home/user/loader/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in common_method_added' file:/home/user/loader/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:37:in(root)‘
org/jruby/RubyKernel.java:1091:in load' file:/home/user/loader/snowplow-emr-etl-runner!/META-INF/main.rb:1:in(root)‘
org/jruby/RubyKernel.java:1072:in require' file:/home/user/loader/snowplow-emr-etl-runner!/META-INF/main.rb:1:in(root)’
/tmp/jruby5396713056082654976extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)’

My config file is as below (masked credentials):

Someone please help to get it work!

Thank you very much!


EMR contract broken
#2

Hi @mythsam,

Could you remove the unnecessary quotes and try again? For example log: s3://st-snwplw-sl/logs instead of log: "s3://st-snwplw-sl/logs".

Check it against the example here.

Regards,
Ihor


#3

Hi Ihor,

Thank you for your reply, I give a try and remove all the un-necessary quotes " (since i didn’t use the software section, the whole yml file is without any quote now), and update the jobflow section to be the same as on the example.

I have updated the change in gist, however, still getting the same error.

Could you see if there is another other thing wrong?

Thanks alot!

Sam


#4

Got the config working now. It is failed due to a mistake of attribute jobflow_name, it should be jobflow_role. After fixed that it works now.

However, when i run the command, i am getting error on staging step, file can’t be moved from log bucket to processing bucket.

./snowplow-emr-etl-runner --debug --config config.yml --resolver config/iglu_resolver.json

D, [2016-05-06T20:41:21.334000 #20758] DEBUG – : Staging raw logs…
moving files from s3://st-snwplw-logs/ to s3://st-snwplw-sl/processing/
F, [2016-05-06T20:41:26.278000 #20758] FATAL – :

NoMethodError (undefined method files' for nil:NilClass): /home/user/loader/snowplow-emr-etl-runner!/gems/sluice-0.2.2/lib/sluice/storage/s3/s3.rb:469:inprocess_files’
org/jruby/ext/thread/Mutex.java:149:in synchronize' /home/user/loader/snowplow-emr-etl-runner!/gems/sluice-0.2.2/lib/sluice/storage/s3/s3.rb:437:inprocess_files’
org/jruby/RubyKernel.java:1511:in loop' /home/user/loader/snowplow-emr-etl-runner!/gems/sluice-0.2.2/lib/sluice/storage/s3/s3.rb:428:inprocess_files’

Any suggestion of this issue?

Thanks a lot!

Sam


#5

Hi @mythsam,

Most likely cause of this error is the problem accessing your raw:in bucket.

Could you, please, try to

  1. Use the AWS CLI to confirm that your AWS credentials have access to your in bucket?
  2. Check ec2_key_name exists in the same region as the one you stated in s3:region
  3. Use placement instead of ec2_subnet_id if not running in VPC

Regards,
Ihor


#7

Hi ihor,

I have checked your concern:

  1. Use the AWS CLI to confirm that your AWS credentials have access to your in bucket?
    Yes, it is accessible

  2. Check ec2_key_name exists in the same region as the one you stated in s3:region
    Yes, ec2_key_name exists in Key Pairs under the same region as in the config

  3. Use placement instead of ec2_subnet_id if not running in VPC
    It is under the VPC, and i tried both cases, either using placement or ec2_subnet_id, it is still giving me the same error.

D, [2016-05-09T14:55:00.438000 #26552] DEBUG – : Staging raw logs…
moving files from s3://st-snwplw-logs/ to s3://st-snwplw-sl/processing/
F, [2016-05-09T14:55:05.744000 #26552] FATAL – :

NoMethodError (undefined method files' for nil:NilClass): /home/user/loader/snowplow-emr-etl-runner!/gems/sluice-0.2.2/lib/sluice/storage/s3/s3.rb:469:inprocess_files’
org/jruby/ext/thread/Mutex.java:149:in synchronize' /home/user/loader/snowplow-emr-etl-runner!/gems/sluice-0.2.2/lib/sluice/storage/s3/s3.rb:437:inprocess_files’
org/jruby/RubyKernel.java:1511:in loop' /home/user/loader/snowplow-emr-etl-runner!/gems/sluice-0.2.2/lib/sluice/storage/s3/s3.rb:428:inprocess_files’


#8

Please disregard my previous msg. I finally got it working now. Thanks for your guide and help. It all due to typo of destination.

Great Thanks!