java.lang.OutOfMemoryError: Java heap space BigQuery Loader

#1

I currently have BigQuery Loader setup as a Batch dataflow job that runs every 100 seconds (most frequent it can be), I constantly get the heap space error and no data is inserted into BigQuery.

Thanks for any advice you may have!

#2

Hi @James_M,

Sorry to hear about that. All our setups currently use streaming inserts and we should have probably been mark batch jobs as experimental setting, although we also didn’t encounter this behavior in those tests we were running.

We’re planning to implement load job API more widely in future, so you can expect this problem to be fixed in 0.2.0. Meanwhile I’d recommend either to switch to streaming inserts or try out longer batches.

Is there any technical reasons you prefer load job API with such a short period?

#3

Hi @anton,

Thanks for getting back to me, great to hear that further development is coming.

I have set up small batch periods as anything greater than 100 seconds and I receive the following error:

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.OutOfMemoryError: Java heap space
        com.google.cloud.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:183)
        com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:102)
        org.apache.beam.runners.core.ReduceFnRunner.lambda$onTrigger$1(ReduceFnRunner.java:1057)
        org.apache.beam.runners.core.ReduceFnContextFactory$OnTriggerContextImpl.output(ReduceFnContextFactory.java:438)
        org.apache.beam.runners.core.SystemReduceFn.onTrigger(SystemReduceFn.java:125)
        org.apache.beam.runners.core.ReduceFnRunner.onTrigger(ReduceFnRunner.java:1060)
        org.apache.beam.runners.core.ReduceFnRunner.emit(ReduceFnRunner.java:930)
        org.apache.beam.runners.core.ReduceFnRunner.onTimers(ReduceFnRunner.java:790)

When it is at 100 seconds, I don’t get an error but nothing is inserted.