Unexpected error: Java heap space using StorageLoader with Postgres


#1

Hi,

I’ve been trying to digest a large amount of data recently and noticed the last day’s worth of data isn’t in our database. I ran the snowplow-runner-and-loader.sh bash script manually to see the error. Below is the issue.

(t2) DOWNLOAD snowplow-data/shredded/good/run=2016-04-26-23-00-40/atomic-events/part-00052 ±> /home/kevinharrison/postgres/run=2016-04-26-23-00-40/atomic-events/part-00052

(t1) DOWNLOAD snowplow-data/shredded/good/run=2016-04-26-23-00-40/atomic-events/part-00053 ±> /home/kevinharrison/postgres/run=2016-04-26-23-00-40/atomic-events/part-00053
+/> /home/kevinharrison/postgres/run=2016-04-26-23-00-40/atomic-events/part-00044
(t9) DOWNLOAD snowplow-data/shredded/good/run=2016-04-26-23-00-40/atomic-events/part-00054 ±> /home/kevinharrison/postgres/run=2016-04-26-23-00-40/atomic-events/part-00054
Unexpected error: Java heap space
org.jruby.util.ByteList.ensure(ByteList.java:340)
org.jruby.util.io.EncodingUtils.strBufCat(EncodingUtils.java:1042)
org.jruby.util.io.EncodingUtils.encCrStrBufCat(EncodingUtils.java:1127)
org.jruby.RubyString.cat19(RubyString.java:1389)
org.jruby.RubyString.cat19(RubyString.java:1378)
org.jruby.RubyString.append19(RubyString.java:2597)
org.jruby.RubyString.concat19(RubyString.java:2632)
org.jruby.RubyString$INVOKER$i$1$0$concat19.call(RubyString$INVOKER$i$1$0$concat19.gen)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.runtime.callsite.ShiftLeftCallSite.call(ShiftLeftCallSite.java:24)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.WhileNode.interpret(WhileNode.java:131)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.IfNode.interpret(IfNode.java:118)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.IfNode.interpret(IfNode.java:116)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.IfNode.interpret(IfNode.java:118)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.IfNode.interpret(IfNode.java:118)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:225)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:219)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
org.jruby.ast.CallTwoArgNode.interpret(CallTwoArgNode.java:59)
org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)

I believe “Unexpected error: Java heap space” is the main error here. Is there any way the heap space can be increased? Alternatively, I suppose I can scale back my event data.

Thanks,
Kevin


#2

If you want to increase your heap space, you can use java -Xms<initial heap size> -Xmx<maximum heap size> on the command line, so it would be:

java -Xms<initial heap size> -Xmx<maximum heap size> -jar ./snowplow-storage-loader

But it looks like you are using Postgres to load your data - the trouble with Postgres is that the StorageLoader has to download all of your data to local disk and then pipe it through to Postgres. This is not horizontally scalable, unlike our load into Redshift. If your event volumes are getting to the point of blowing up the StorageLoader, I would strongly recommend moving over to Redshift…


#3

Thank you for the suggestion. It took a little work but I’ve made the move over to redshift. Doing some quick testing, this looks like a much better solution than postgres. Thanks again.