Guys -
I’m stuck debugging an EMR failure. Cluster fails 20mins into the enrich job - monitoring shows a spike in memory allocation near the failure, and investigating logs I found the culprit:
2017-06-02 11:56:20,622 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.lang.StringBuilder.toString(StringBuilder.java:405)
at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:349)
at com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:83)
at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2344)
at org.json4s.jackson.JsonMethods$class.compact(JsonMethods.scala:34)
at org.json4s.jackson.JsonMethods$.compact(JsonMethods.scala:50)
at com.snowplowanalytics.snowplow.enrich.common.outputs.BadRow.toCompactJson(BadRow.scala:101)
at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$13$$anonfun$apply$1.apply(EtlJob.scala:189)
at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$13$$anonfun$apply$1.apply(EtlJob.scala:188)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$13.apply(EtlJob.scala:188)
at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$13.apply(EtlJob.scala:182)
at com.twitter.scalding.FlatMapFunction.operate(Operations.scala:46)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:80)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
at com.twitter.scalding.MapFunction.operate(Operations.scala:59)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
Monitoring charts
I’m running this cluster with 4x m4.4xlarge
, so it’s unlikely that it doesn’t have enough memory to run this job. Improving instance types and increasing number of instances still result in job failures.
Any ideas what I could do to solve this?
Thanks!!