EmrEtlRunner throws "contract violation" reading LZO Sink data files


#1

I’ve managed to get the EmrEtlRunner code running, meaning it’s apparently happy with the config file now. But it exits with an error message after moving files from the “good” directory to “processing”.

The raw data it’s reading is the output from the LZO S3 Sink process pulling data from a Kinesis stream that the streaming collector created.

— cut here —
$ snowplow-emr-etl-runner --config runner.conf --resolver iglu-resolver.json
D, [2016-07-19T11:04:22.476000 #22479] DEBUG – : Staging raw logs…
moving files from s3://smugmug-snowplow-sink/raw/good/ to s3://smugmug-snowplow-sink/raw/processing/
(t0) MOVE smugmug-snowplow-sink/raw/good -> smugmug-snowplow-sink/raw/processing/good.us-east-1.raw
±> smugmug-snowplow-sink/raw/processing/good.us-east-1.raw
x smugmug-snowplow-sink/raw/good
D, [2016-07-19T11:04:25.603000 #22479] DEBUG – : Waiting a minute to allow S3 to settle (eventual consistency)
D, [2016-07-19T11:05:25.608000 #22479] DEBUG – : Initializing EMR jobflow
F, [2016-07-19T11:05:25.612000 #22479] FATAL – :

ContractError (Contract violation for argument 4 of 4:
Expected: String,
Actual: nil
Value guarded in: Snowplow::EmrEtlRunner::EmrJob::get_assets
With Contract: String, String, String, String => Hash
At: /usr/local/bin/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:685 ):
/usr/local/bin/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:69:in Contract' org/jruby/RubyProc.java:271:incall’
/usr/local/bin/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:147:in failure_callback' /usr/local/bin/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:164:incommon_method_added’
/usr/local/bin/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in common_method_added' file:/usr/local/bin/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in(root)‘
org/jruby/RubyKernel.java:1091:in load' file:/usr/local/bin/snowplow-emr-etl-runner!/META-INF/main.rb:1:in(root)‘
org/jruby/RubyKernel.java:1072:in require' file:/usr/local/bin/snowplow-emr-etl-runner!/META-INF/main.rb:1:in(root)’
/tmp/jruby9041311377635623490extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)’


#2

Hi @cnamejj,

I wonder if you were hit by today’s AWS service degradation in us-east-1 region which took place around 11AM-12PM PDT and lasted till about 1:30PM PDT.

Please, refer to http://status.aws.amazon.com/ for more details.

Could you try rerunning the batch job with --skip staging option?

If it still fails, what version of the runner have you deployed? What is your configuration file’s content?

–Ihor


#3

OK, this was just a missing line in the config file. The error message just wasn’t very easy to interpret. :slight_smile:

I was debugging a different problem with storage loader last night, comparing the contract “expect” versus “actual” JSON. It said it didn’t “expect” some lines, which I removed. And it wanted something else to be an array of strings instead of a string, which I fixed.

But since it’s a common config, the lines that were removed were not there when I ran EmrEtlRunner today. The missing line was “hadoop_elasticsearch: 0.1.0”. After adding that back it’s gone farther.

What I don’t know is if “storage loader” will complain about the config with that line added back, but even if it does that’s manageable.

-jj