Max size of Javascript Enrichments?


#1

I’m playing around with the Javascript enrichment process right now - we’re in the early stages of an MVP for our project and trying to figure out how things will work. I created a test enrichment that uses webpack to compile in an MD5 library to the enrichment, which makes the output a decent size (my final enrichment json is 7KB). The enrichment works great in the real-time pipeline on snowplow mini, but when I run it in the emr-etl-runner, it dies, giving the following error:

ArgumentError (AWS EMR API Error (ValidationException): 1 validation error detected: Value '[com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob,

... lots of base64 encoded options (iglu_config, enrichments, etc) ...

' at 'steps.2.member.hadoopJarStep.args' failed to satisfy constraint: Member must satisfy constraint: [Member must have length less than or equal to 10280, Member must have length greater than or equal to 0, Member must satisfy regular expression pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]*]):
    uri:classloader:/gems/elasticity-6.0.10/lib/elasticity/aws_session.rb:33:in `submit'
    uri:classloader:/gems/elasticity-6.0.10/lib/elasticity/emr.rb:302:in `run_job_flow'
    uri:classloader:/gems/elasticity-6.0.10/lib/elasticity/job_flow.rb:165:in `run'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:474:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:74:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `<main>'
    org/jruby/RubyKernel.java:973:in `load'
    uri:classloader:/META-INF/main.rb:1:in `<main>'
    org/jruby/RubyKernel.java:955:in `require'
    uri:classloader:/META-INF/main.rb:1:in `(root)'
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

I’m assuming from that error message that it’s telling me my enrichments are too large, and sure enough, if I cut the MD5 library out of my test enrichment it works fine. Is there any way I can get around this, or are we just going to be limited in how much code we can put in these enrichments?


#2

Hi @mrosack - you are right - the limiting factor is the total size of the enrichments (plus Iglu resolver), because these configuration files are Base64-encoded and passed to the Hadoop job running in EMR via command-line arguments.

There is a ticket to work around this:

but it is as yet unscheduled.


#3

I’ve also just run up against this problem.

I’ve managed to fix it by removing the unused enrichments from the enrichments folder (previously I just had them set to false). I can reduce it some more by minifying my JavaScript.

Are there any other work arounds?


#4

Hey @gareth - no, there are no other workarounds currently, apart from removing any whitespace from the configuration JSONs.