BigQuery Loader - NullPointerException


#1

Hi everyone,

I’m testing out the BigQuery Loader 0.1.0-rc18 but coming up against a NullPointerException that I can’t seem to solve.

I’ve successfully run mutator create to initialise the BigQuery table, but when trying to run the BigQuery Loader I get the a NullPointerException. Could anyone point me in the right direction to solving this?

My config is:

   "data":{
      "name":"SnowplowBigQuery",
      "id":"ff5176f8-c0e3-4ef0-a94f-3b4f86e042ca",
      "input":"enriched-good",
      "projectId":"my-gcp-project",
      "datasetId":"snowplow",
      "tableId":"events",
      "typesTopic":"bql-types",
      "typesSubscription":"bql-types-sub",
      "badRows":"bql-bad",
      "failedInserts":"bql-failed",
      "load":{
         "mode":"STREAMING_INSERTS",
         "retry":true
      },
      "purpose":"ENRICHED_EVENTS"
   }
}

The error:

[run-main-4] WARN com.spotify.scio.VersionUtil$ - A newer version of Scio is available: 0.6.1 -> v0.7.0-alpha1
[run-main-4] WARN org.apache.beam.sdk.Pipeline - The following transforms do not have stable unique names: saveAsPubsub@{Loader.scala:93}1
[error] (run-main-4) org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.NullPointerException
[error] org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.NullPointerException
[error]         at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
[error]         at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
[error]         at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
[error]         at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
[error]         at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
[error]         at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
[error]         at com.spotify.scio.ScioContext$$anonfun$close$1.apply(ScioContext.scala:414)
[error]         at com.spotify.scio.ScioContext$$anonfun$close$1.apply(ScioContext.scala:399)
[error]         at com.spotify.scio.ScioContext.requireNotClosed(ScioContext.scala:466)
[error]         at com.spotify.scio.ScioContext.close(ScioContext.scala:399)
[error]         at com.snowplowanalytics.snowplow.storage.bigquery.loader.Main$.main(Main.scala:22)
[error]         at com.snowplowanalytics.snowplow.storage.bigquery.loader.Main.main(Main.scala)
[error]         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]         at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: java.lang.NullPointerException
[error]         at com.google.api.client.util.ArrayMap$Entry.hashCode(ArrayMap.java:419)
[error]         at java.util.AbstractMap.hashCode(AbstractMap.java:530)
[error]         at java.util.AbstractList.hashCode(AbstractList.java:541)
[error]         at com.google.api.client.util.ArrayMap$Entry.hashCode(ArrayMap.java:419)
[error]         at java.util.AbstractMap.hashCode(AbstractMap.java:530)
[error]         at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:206)
[error]         at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:64)
[error]         at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:211)
[error]         at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:168)
[error]         at com.snowplowanalytics.snowplow.storage.bigquery.loader.LoaderRow.hashCode(LoaderRow.scala:40)
[error]         at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:206)
[error]         at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:64)
[error]         at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:211)
[error]         at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:168)
[error]         at scala.util.Right.hashCode(Either.scala:201)
[error]         at java.util.Arrays.hashCode(Arrays.java:4146)
[error]         at java.util.Objects.hash(Objects.java:128)
[error]         at org.apache.beam.sdk.util.WindowedValue$TimestampedValueInGlobalWindow.hashCode(WindowedValue.java:309)
[error]         at java.util.HashMap.hash(HashMap.java:339)
[error]         at java.util.HashMap.get(HashMap.java:557)
[error]         at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:191)
[error]         at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:130)
[error]         at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.HashMultimap.put(HashMultimap.java:48)
[error]         at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:111)
[error]         at org.apache.beam.runners.direct.ParDoEvaluator$BundleOutputManager.output(ParDoEvaluator.java:260)
[error]         at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:309)
[error]         at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:77)
[error]         at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:621)
[error]         at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:609)
[error]         at com.spotify.scio.util.Functions$$anon$7.processElement(Functions.scala:145)
[error] Nonzero exit code: 1
[error] (Compile / run) Nonzero exit code: 1
[error] Total time: 23 s, completed Oct 17, 2018 10:11:57 AM

#2

Hello @RhysJackson!

I’m very excited about the fact somebody tries it out already (although I believe it should be quite stable).

Could you please:

  1. Make sure you’re running it with DataflowRunner, not DirectRunner. Latter hasn’t been very stable and predictable.
  2. Post whole command you’re using to launch Loader.

Also, what asset are you using? Did you compile it yourself or used docker image?

UPD: after looking at traceback more closely it seems indeed you’re using default DirectRunner. I’d recommend you to add --runner=DataflowRunner flag.

Also, make sure that mutator is listening the types topic. Otherwise, all events containing any new custom schemas will go to failedInserts topic.


#3

Hi Anton,

Thanks for building the BigQuery loader! You’re right, the --runner=DataflowRunner is exactly what I was missing.

I actually tried both docker and compiling myself and had no problems (other than the error above), but ended up using the binaries from bintray. This felt like the best way to reduce human error.

Just for anyone else having similar problems, I used the documentation here to provide the following additional arguments to BigQuery Loader:
--project=my-gcp-project
--region=europe-west1
--tempLocation=gs://my-eu-tmp-bucket/

The GCS temp location and Dataflow region must be in the same region as your BigQuery dataset.

Thanks so much for your help! I can confirm I’ve now got data flowing into BigQuery :smiley: