Stream Enrich throws NoClassDefFoundError

I’m running into a very strange bug with the Scala streaming enricher. I cannot get it working at all as the first time it encounters an event (when submitted through the Scala collector), this error is produced:

java.lang.NoClassDefFoundError: scalaz/syntax/ApplicativeBuilder$ApplicativeBuilder3$ApplicativeBuilder4$ApplicativeBuilder5$ApplicativeBuilder6$ApplicativeBuilder7$ApplicativeBuilder8$ApplicativeBuilder9$ApplicativeBuilder10$ApplicativeBuilder11$ApplicativeBuilder12$$anonfun$apply$66
	at scalaz.syntax.ApplicativeBuilder$ApplicativeBuilder3$ApplicativeBuilder4$ApplicativeBuilder5$ApplicativeBuilder6$ApplicativeBuilder7$ApplicativeBuilder8$ApplicativeBuilder9$ApplicativeBuilder10$ApplicativeBuilder11$ApplicativeBuilder12$class.apply(ApplicativeBuilder.scala:141)
	at scalaz.syntax.ApplicativeBuilder$ApplicativeBuilder3$ApplicativeBuilder4$ApplicativeBuilder5$ApplicativeBuilder6$ApplicativeBuilder7$ApplicativeBuilder8$ApplicativeBuilder9$ApplicativeBuilder10$ApplicativeBuilder11$$anon$10.apply(ApplicativeBuilder.scala:131)
	at com.snowplowanalytics.snowplow.enrich.common.enrichments.EnrichmentManager$.enrichEvent(EnrichmentManager.scala:529)
	at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3.apply(EtlPipeline.scala:94)
	at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3.apply(EtlPipeline.scala:93)
	at scalaz.NonEmptyList$class.map(NonEmptyList.scala:29)
	at scalaz.NonEmptyListFunctions$$anon$4.map(NonEmptyList.scala:164)
	at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(EtlPipeline.scala:93)
	at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(EtlPipeline.scala:91)
	at scalaz.Validation$class.map(Validation.scala:114)
	at scalaz.Success.map(Validation.scala:329)
	at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1.apply(EtlPipeline.scala:91)
	at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1.apply(EtlPipeline.scala:89)
	at scala.Option.map(Option.scala:145)
	at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1.apply(EtlPipeline.scala:89)
	at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1.apply(EtlPipeline.scala:87)
	at scalaz.Validation$class.map(Validation.scala:114)
	at scalaz.Success.map(Validation.scala:329)
	at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$.processEvents(EtlPipeline.scala:87)
	at com.snowplowanalytics.snowplow.enrich.kinesis.sources.AbstractSource.enrichEvents(AbstractSource.scala:184)
	at com.snowplowanalytics.snowplow.enrich.kinesis.sources.AbstractSource$$anonfun$5.apply(AbstractSource.scala:214)
	at com.snowplowanalytics.snowplow.enrich.kinesis.sources.AbstractSource$$anonfun$5.apply(AbstractSource.scala:214)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
	at scala.collection.immutable.List.foreach(List.scala:318)
	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
	at com.snowplowanalytics.snowplow.enrich.kinesis.sources.AbstractSource.enrichAndStoreEvents(AbstractSource.scala:214)
	at com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource$RawEventProcessor.processRecordsWithRetries(KinesisSource.scala:158)
	at com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource$RawEventProcessor.processRecords(KinesisSource.scala:148)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.V1ToV2RecordProcessorAdapter.processRecords(V1ToV2RecordProcessorAdapter.java:42)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask.call(ProcessTask.java:169)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
	at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Please note that for some reason this does not route to the bad stream. Instead the worker actually crashes. Any insight which you can offer would be appreciated.

Hello @morgante

This traceback states that you’re missing at least one dependency - scalaz. We’re using it extensively to aggregate and handle errors.

How are you running Stream Enrich? It is supposed to be executed as fatjar, which contains all dependencies including scalaz.

UPD. After closer look, it seems you actually have scalaz, but I’m wondering, may be you included somehow binary incompatible version of it? (We’re using 7.0). May be as transitive dependency? So above question is still valid.

I’m running the fat jar, but not sure where any transitive dependencies could be coming from.

The build is being done in a Docker container (https://hub.docker.com/r/williamyeh/sbt/) using the sbt assembly command from the docs.

I’m definitely not a Scala expert, but so far I’ve been able to trace the problem to this line:

Whats not clear to me is how this line ends up calling scalaz and how there could be a version incompatibility if I’m building from the repo source.

More precisely this is caused by one of lines above with |@| (applicative builder operator), which combines scalaz Validations (all those contexts).

As you said, real question is where this incompatible version is coming from. Is it possible you add something to CLASSPATH? I saw this issue few times on AWS EMR - Java Runtime can grab wrong JAR from its CLASSPATH even if you explicitly included correct version into fatjar.

But before we’ve started to suspect CLASSPATH, can you confirm with jar vtf snowplow-stream-enrich-0.8.1 | grep scalaz (or whatever version you’re compiling) that fatjar scalaz jar compiled somewhere near 21 Apr 2013 (second column)?

The output seems to be split into two sections, one from recently and one from Apr 2013. Could that be the problem? Output below:

     0 Sun Aug 21 06:22:40 UTC 2016 org/json4s/jackson/scalaz/
     0 Sun Aug 21 06:22:40 UTC 2016 org/json4s/native/scalaz/
     0 Sun Aug 21 06:22:40 UTC 2016 org/json4s/scalaz/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/std/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/std/java/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/std/java/math/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/std/java/util/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/std/java/util/concurrent/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/std/math/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/std/util/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/std/util/parsing/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/std/util/parsing/combinator/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/syntax/
     0 Sun Aug 21 06:22:40 UTC 2016 scalaz/syntax/std/
  1219 Thu Oct 23 18:06:40 UTC 2014 org/json4s/jackson/scalaz/package$$anonfun$JValueShow$1.class
   766 Thu Oct 23 18:06:40 UTC 2014 org/json4s/jackson/scalaz/package$.class
   894 Thu Oct 23 18:06:40 UTC 2014 org/json4s/jackson/scalaz/package.class
  1433 Thu Oct 23 18:06:40 UTC 2014 org/json4s/native/scalaz/package$$anonfun$JValueShow$1.class
  1149 Thu Oct 23 18:06:40 UTC 2014 org/json4s/native/scalaz/package$$anonfun$JValueShow$2.class
   928 Thu Oct 23 18:06:40 UTC 2014 org/json4s/native/scalaz/package$.class

In case it’s helpful, I have uploaded the Dockerfile I’m using. There’s nothing particularly unusual about it and this is the only jar it’s building.

Hey @morgante

This makes me still uncertain about whether this is build or runtime (class path) problem. I’m going to try to reproduce it with your Dockerfiles and will let you know later this day.

In the meantime, did you try our precompiled fatjar?

Yes, I just gave that a try and it worked somehow. Unfortunately I can’t use the fatjar in production until this PR is merged, but it’s good for testing.

Did you have any luck with my Dockerfile? Thanks.

Hello @morgante

I’ve just successfully compiled and ran Stream Enrich using provided Dockerfile, also I can confirm that it has correct scalaz dependency built-in.

Let’s concentrate on how are you running this fatjar. Is it clean EC2 instance, your box or something else? What version of Docker are you using? What is exact command? Is there any more changes except that PR?

Let’s concentrate on how are you running this fatjar. Is it clean EC2 instance, your box or something else? What version of Docker are you using? What is exact command? Is there any more changes except that PR?

I’m running it from within the Dockerfile I shared with you, using Docker version 1.12.1-rc1. The only change from master is adding the “.resolve” line from that PR.

Hello @morgante

This is extremely odd, to be honest, I don’t know how this is even possible. And most confusing part is that I cannot reproduce this error.

All I can advice for now is to try to use our development box with vagrant provision.