Exception in emr step of loading data in redshift


#1

I have rdb loader and other required jars in my s3 bucket emr configurations are fine but sometimes such CNFE occurs and emr step fails…

I dont know why this happens sometimes and otherwise same job works fine. Please help…

Exception in thread "main" java.lang.NoClassDefFoundError: cats/FlatMap
	at com.snowplowanalytics.snowplow.rdbloader.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: cats.FlatMap
	at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more
Caused by: java.io.IOException: Input/output error
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:255)
	at sun.misc.Resource.getBytes(Resource.java:124)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)

#2

Hi @Dev, what version of RDB Loader are you using? Did you try to compile it yourself?


#3

there many jars kept in assest bucket not sure which one is used by emr


#4

It should be in your config.yml. Could you paste it here? (with credentials removed)


#5

Hey, its rdb_loader: 0.14.0 and this jar is present in s3 bucket. Everyday I am getting some new CNFE for different things and when I restart job it runs successfully. Not getting the actual problem ?
Today I got following error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/yaml/snakeyaml/representer/Representer
	at org.yaml.snakeyaml.Yaml.<init>(Yaml.java:64)
	at io.circe.yaml.parser.package$$anonfun$parseSingle$1.apply(package.scala:29)
	at io.circe.yaml.parser.package$$anonfun$parseSingle$1.apply(package.scala:29)
	at cats.syntax.EitherObjectOps$.catchNonFatal$extension(either.scala:267)
	at io.circe.yaml.parser.package$.parseSingle(package.scala:29)
	at io.circe.yaml.parser.package$.parse(package.scala:19)
	at io.circe.yaml.parser.package$.parse(package.scala:23)
	at com.snowplowanalytics.snowplow.rdbloader.config.SnowplowConfig$.parse(SnowplowConfig.scala:49)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$$anonfun$9.apply(CliConfig.scala:137)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$$anonfun$9.apply(CliConfig.scala:137)
	at cats.syntax.EitherOps$.flatMap$extension(either.scala:129)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$.transform(CliConfig.scala:137)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$$anonfun$parse$1.apply(CliConfig.scala:99)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$$anonfun$parse$1.apply(CliConfig.scala:99)
	at scala.Option.map(Option.scala:146)
	at com.snowplowanalytics.snowplow.rdbloader.config.CliConfig$.parse(CliConfig.scala:99)
	at com.snowplowanalytics.snowplow.rdbloader.Main$.main(Main.scala:33)
	at com.snowplowanalytics.snowplow.rdbloader.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.yaml.snakeyaml.representer.Representer
	at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 24 more
Caused by: java.io.IOException: Input/output error
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:255)
	at sun.misc.Resource.getBytes(Resource.java:124)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)

#6

@Dev, could you please paste your config.yml. It could help us to identify the problem.


#7
aws:
   access_key_id: 
  secret_access_key: 
    region: 
    buckets:
      assets:  # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
      jsonpath_assets: # If you have defined your own JSON Schemas, add the s3:// path to your own JSON Path files in your own bucket here
      log: 
      raw:
        in:                  # Multiple in buckets are permitted
        processing: 
        archive:     # e.g. s3://my-archive-bucket/in
      enriched:
        good:         # e.g. s3://my-out-bucket/enriched/good
        bad:             # e.g. s3://my-out-bucket/enriched/bad
        errors:       # Leave blank unless :continue_on_unexpected_error: set to true below
        archive:     # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
      shredded:
        good:          # e.g. s3://my-out-bucket/shredded/good
        bad:             # e.g. s3://my-out-bucket/shredded/bad
        errors:    # Leave blank unless :continue_on_unexpected_error: set to true below
        archive:   # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded
  emr:
    ami_version: 5.9.0      # Don't change this
    region:          # Always set this
    jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles
    service_role: EMR_DefaultRole     # Created using $ aws emr create-default-roles
    placement:     # Set this if not running in VPC. Leave blank otherwise
    ec2_subnet_id:  # Set this if running in VPC. Leave blank otherwise
    ec2_key_name: 
    bootstrap: []           # Set this to specify custom boostrap actions. Leave empty otherwise
    software:
      hbase:                # Optional. To launch on cluster, provide version, "0.92.0", keep quotes. Leave empty otherwise.
      lingual:              # Optional. To launch on cluster, provide version, "1.1", keep quotes. Leave empty otherwise.
    # Adjust your Spark cluster below
    jobflow:
      job_name: Snowplow ETL PROD
      master_instance_type: m1.medium
      core_instance_count: 1
      core_instance_type:  m4.xlarge
      core_instance_ebs:    # Optional. Attach an EBS volume to each core instance.
        volume_size: 100    # Gigabytes
        volume_type: "gp2"
        volume_iops: 400    # Optional. Will only be used if volume_type is "io1"
        ebs_optimized: false # Optional. Will default to true
      task_instance_count: 0 # Increase to use spot instances
      task_instance_type: m4.xlarge
      task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
    bootstrap_failure_tries: 3 # Number of times to attempt the job in the event of bootstrap failures
    configuration:
      yarn-site:
        yarn.resourcemanager.am.max-attempts: "1"
      spark:
        maximizeResourceAllocation: "true"
    additional_info:        # Optional JSON string for selecting additional features
collectors:
  format: "thrift" # Or 'clj-tomcat' for the Clojure Collector, or 'thrift' for Thrift records, or 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs
enrich:
  versions:
    spark_enrich: 1.13.0 # Version of the Hadoop Enrichment process
  continue_on_unexpected_error: false # Set to 'true' (and set out_errors: above) if you don't want any exceptions thrown from ETL
  output_compression: GZIP # Compression only supported with Redshift, set to NONE if you have Postgres targets. Allowed formats: NONE, GZIP
storage:
  versions:
    rdb_shredder: 0.13.0        # Version of the Relational Database Shredding process
    rdb_loader: 0.14.0          # Version of the Relational Database Loader app
    hadoop_elasticsearch: 0.1.0 # Version of the Hadoop to Elasticsearch copying process
  #download:
    #folder: 
monitoring:
  tags:  # Name-value pairs describing this job
  logging:
    level: DEBUG # You can optionally switch to INFO for production
  snowplow:
    method: get
    app_id: snowplow # e.g. snowplow
    collector:

#8

@chetan please check maximum limit of resources for your AWS account. You might be exceeding the limit.(multiple jobs using resources)


#9

I have checked limit of resouces and there are enough no of resources avalilable at that time and no other emr job was running.