Should not stage files for enrichment


#1

Hello There,

I am getting this error

F, [2016-04-11T20:00:24.433000 #7633] FATAL -- :

Snowplow::EmrEtlRunner::DirectoryNotEmptyError (Should not stage files for enrichment, s3://bucket-name/logs/enriched/good/ is$
    /home/ec2-user/market/market-name/lib/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/s3_tasks.rb:129:in `stage_logs_for_em$
    /home/ec2-user/market/market-name/lib/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:51:in `run'
    /home/ec2-user/market/market-name/lib/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /home/ec2-user/market/market-name/lib/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /home/ec2-user/market/market-name/lib/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `common_method_added'
    file:/home/ec2-user/market/market-name/lib/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `(root)'
    org/jruby/RubyKernel.java:1091:in `load'
    file:/home/ec2-user/market/market-name/lib/snowplow-emr-etl-runner!/META-INF/main.rb:1:in `(root)'
    org/jruby/RubyKernel.java:1072:in `require'
    file:/home/ec2-user/market/market-name/lib/snowplow-emr-etl-runner!/META-INF/main.rb:1:in `(root)'
    /tmp/jruby6003993742615468894extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:$

Thanks
Brijesh Kumar Singh


#2

Hi @birju1100,

That error indicates that either

  • The (previous) EMR job is still in progress while you are trying to launch yet another EmrEtlRunner instance

or

  • The previous job failed somehow leaving the processing log files not archived (not moved from raw:processing to raw:archive bucket).

Hence you get the error Snowplow::EmrEtlRunner::DirectoryNotEmptyError (Should not stage files for enrichment.

If the previous job did fail, please, check at what stage. If the events haven’t been processed/enriched yet, you could simply rerun EmrEtlRunner with --skip staging option. That should prevent an attempt to stage the logs for processing (as some are already there).

If on the other hand the previous job failed at archive_raw step, you might need to manualy move the files to archive bucket and rerun EmrEtlRunner as usual (without --skip staging option).

You might find the following flow diagram useful:

Regards,
Ihor