Unable to start Enrichment Process


#1

Hi,

I am trying to set up snowplow version 83 .While running the enrichment process I am getting the below error:

D, [2016-10-03T16:32:45.291000 #13095] DEBUG -- : Initializing EMR jobflow
F, [2016-10-03T16:32:56.951000 #13095] FATAL -- :

**ArgumentError (AWS EMR API Error (ValidationException): Size of step parameter length exceeded the maximum allowed.):**
    uri:classloader:/gems/elasticity-6.0.8/lib/elasticity/aws_session.rb:33:in `submit'
    uri:classloader:/gems/elasticity-6.0.8/lib/elasticity/emr.rb:302:in `run_job_flow'
    uri:classloader:/gems/elasticity-6.0.8/lib/elasticity/job_flow.rb:153:in `run'
    **uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:451:in `run'**
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `<main>'
    org/jruby/RubyKernel.java:973:in `load'
    uri:classloader:/META-INF/main.rb:1:in `<main>'
    org/jruby/RubyKernel.java:955:in `require'
    uri:classloader:/META-INF/main.rb:1:in `(root)'
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

In the config file for emr I am using:

 emr:
    ami_version: 4.5.0      # Don't change this
    region: eu-west-1       # Always set this
    jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
    service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
    ec2_subnet_id: ****** # Set this if running in VPC. Leave blank otherwise
    ec2_key_name: *********
    bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
    software:
      hbase:                # To launch on cluster, provide version, "0.92.0", keep quotes
      lingual:              # To launch on cluster, provide version, "1.1", keep quotes


**and for enrich**
enrich:
  job_name:  test job # Give your job a name
  versions:
    hadoop_enrich: 1.8.0 # Version of the Hadoop Enrichment process
    hadoop_shred: 0.9.0 # Version of the Hadoop Shredding process
    hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process

as per the logs in the file emr_job.rb - we are creating the steps . We have not added any additional step params.

Could you please elp me in understanding the cause of the error


#2

Hi @ssingh195 - how many enrichments have you got configured - is it more than usual? That error normally indicates that too much data has been sent as the CLI arguments to one of the Hadoop jobs; it’s highly likely that this is the Base64-encoded enrichments triggering this.

Try reducing the unnecessary whitespace in your enrichments. We are planning on fixing this longer term by moving the enrichment configs to DynamoDB, but this hasn’t been scheduled yet:


#3

Thanks a lot Alex. It was helpful. Issue was due to multiple repository entry in the iglu resolver.

Earlier my iglu_resolver.json was as below which caused the issue:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-0",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      } ,
	  {
        "name": "UDMD Repository",
        "priority": 0,
        "vendorPrefixes": [ "udmd.unileversolutions" ],
        "connection": {
          "http": {
            "uri": "http://s3-eu-west-1.amazonaws.com/udmd-p-schemas"
          }
		}
      } ,
	  {
        "name": "UDMD Repository",
        "priority": 0,
        "vendorPrefixes": [ "udmd.videocomponent" ],
        "connection": {
          "http": {
            "uri": "http://s3-eu-west-1.amazonaws.com/udmd-p-schemas"
          }
		}
      } ,
      {
        "name": "UDMD Repository",
        "priority": 0,
        "vendorPrefixes": [ "udmd.product" ],
        "connection": {
          "http": {
            "uri": "http://s3-eu-west-1.amazonaws.com/udmd-p-schemas"
          }
		}
      } ,
      {
        "name": "UDMD Repository",
        "priority": 0,
        "vendorPrefixes": [ "udmd.event" ],
        "connection": {
          "http": {
            "uri": "http://s3-eu-west-1.amazonaws.com/udmd-p-schemas"
          }
		}
      } ,
      {
        "name": "UDMD Repository",
        "priority": 0,
        "vendorPrefixes": [ "udmd.promotion" ],
        "connection": {
          "http": {
            "uri": "http://s3-eu-west-1.amazonaws.com/udmd-p-schemas"
          }
		}
      } ,
      {
        "name": "UDMD Repository",
        "priority": 0,
        "vendorPrefixes": [ "udmd.page" ],
        "connection": {
          "http": {
            "uri": "http://s3-eu-west-1.amazonaws.com/udmd-p-schemas"
          }
		}
      }	  
    ]
  }
}

Correct iglu_resolver.json is given below: Removed the multiple entries

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-0",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      } ,
	  {
        "name": "UDMD Repository",
        "priority": 0,
        "vendorPrefixes": [ "udmd.unileversolutions" , "udmd.videocomponent" , "udmd.product" ,  "udmd.event" ,"udmd.promotion" ,"udmd.page"  ],
        "connection": {
          "http": {
            "uri": "http://s3-eu-west-1.amazonaws.com/udmd-p-schemas"
          }
		}
      } 
	 
       
    ]
  }
}

#4

Ah right yes - that would fix it…