EmrEtlRunner throws "contract violation" reading LZO Sink data files

cnamejj · July 19, 2016, 6:11pm

I’ve managed to get the EmrEtlRunner code running, meaning it’s apparently happy with the config file now. But it exits with an error message after moving files from the “good” directory to “processing”.

The raw data it’s reading is the output from the LZO S3 Sink process pulling data from a Kinesis stream that the streaming collector created.

— cut here —
$ snowplow-emr-etl-runner --config runner.conf --resolver iglu-resolver.json
D, [2016-07-19T11:04:22.476000 #22479] DEBUG – : Staging raw logs…
moving files from s3://smugmug-snowplow-sink/raw/good/ to s3://smugmug-snowplow-sink/raw/processing/
(t0) MOVE smugmug-snowplow-sink/raw/good -> smugmug-snowplow-sink/raw/processing/good.us-east-1.raw
±> smugmug-snowplow-sink/raw/processing/good.us-east-1.raw
x smugmug-snowplow-sink/raw/good
D, [2016-07-19T11:04:25.603000 #22479] DEBUG – : Waiting a minute to allow S3 to settle (eventual consistency)
D, [2016-07-19T11:05:25.608000 #22479] DEBUG – : Initializing EMR jobflow
F, [2016-07-19T11:05:25.612000 #22479] FATAL – :

ContractError (Contract violation for argument 4 of 4:
Expected: String,
Actual: nil
Value guarded in: Snowplow::EmrEtlRunner::EmrJob::get_assets
With Contract: String, String, String, String => Hash
At: /usr/local/bin/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:685 ):
/usr/local/bin/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:69:in Contract' org/jruby/RubyProc.java:271:incall’
/usr/local/bin/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:147:in failure_callback' /usr/local/bin/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:164:incommon_method_added’
/usr/local/bin/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in common_method_added' file:/usr/local/bin/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in(root)‘
org/jruby/RubyKernel.java:1091:in load' file:/usr/local/bin/snowplow-emr-etl-runner!/META-INF/main.rb:1:in(root)‘
org/jruby/RubyKernel.java:1072:in require' file:/usr/local/bin/snowplow-emr-etl-runner!/META-INF/main.rb:1:in(root)’
/tmp/jruby9041311377635623490extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)’

ihor · July 20, 2016, 12:29am

Hi @cnamejj,

I wonder if you were hit by today’s AWS service degradation in us-east-1 region which took place around 11AM-12PM PDT and lasted till about 1:30PM PDT.

Please, refer to http://status.aws.amazon.com/ for more details.

Could you try rerunning the batch job with --skip staging option?

If it still fails, what version of the runner have you deployed? What is your configuration file’s content?

–Ihor

cnamejj · July 20, 2016, 1:09am

OK, this was just a missing line in the config file. The error message just wasn’t very easy to interpret.

I was debugging a different problem with storage loader last night, comparing the contract “expect” versus “actual” JSON. It said it didn’t “expect” some lines, which I removed. And it wanted something else to be an array of strings instead of a string, which I fixed.

But since it’s a common config, the lines that were removed were not there when I ran EmrEtlRunner today. The missing line was “hadoop_elasticsearch: 0.1.0”. After adding that back it’s gone farther.

What I don’t know is if “storage loader” will complain about the config with that line added back, but even if it does that’s manageable.

-jj

Topic		Replies	Views
Cannot start EmrEtlRunner due to Contract Violation AWS batch pipeline (Legacy)	0	1364	September 6, 2017
My config.yml causes a contract violation in EmrEtlRunner AWS batch pipeline (Legacy)	15	3188	July 12, 2016
EMR contract broken AWS batch pipeline (Legacy)	3	1273	August 9, 2017
EmrEtlRunner failed to start For engineers	14	1054	April 23, 2019
Error while running EmrEltRunner For engineers	4	556	July 8, 2020

EmrEtlRunner throws "contract violation" reading LZO Sink data files

Related Topics