BigQuery Loader 0.2.0: GCP "Please resubmit with a valid region"

We’re giving the new 0.2.0 BigQuery loader a crack.

For some reason, when we run it, it falls back to the default region set by gcloud init: australia-southeast1. It fails, saying our region is invalid but this doesn’t seem wrong to me. We’re running BigQuery out of this same region, so I don’t see why this is failing.

[main] INFO org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$DefaultGcpRegionFactory - Using default GCP region australia-southeast1 from gcloud CLI
[main] WARN org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer - Request failed with code 400, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://dataflow.googleapis.com/v1b3/projects/mojito-tracker/locations/australia-southeast1/jobs.
Exception in thread "main" java.lang.RuntimeException: Failed to create a workflow job: (c4e47bef4595e5c1): The workflow could not be created, since it was sent to an invalid or unreleased region. Please resubmit with a valid region.
        at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:974)
        at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:188)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
        at com.spotify.scio.ScioContext.execute(ScioContext.scala:598)
        at com.spotify.scio.ScioContext$$anonfun$run$1.apply(ScioContext.scala:586)
        at com.spotify.scio.ScioContext$$anonfun$run$1.apply(ScioContext.scala:574)
        at com.spotify.scio.ScioContext.requireNotClosed(ScioContext.scala:694)
        at com.spotify.scio.ScioContext.run(ScioContext.scala:574)
        at com.snowplowanalytics.snowplow.storage.bigquery.loader.Main$.main(Main.scala:25)
        at com.snowplowanalytics.snowplow.storage.bigquery.loader.Main.main(Main.scala)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "(c4e47bef4595e5c1): The workflow could not be created, since it was sent to an invalid or unreleased region. Please resubmit with a valid region.",
    "reason" : "badRequest"
  } ],
  "message" : "(c4e47bef4595e5c1): The workflow could not be created, since it was sent to an invalid or unreleased region. Please resubmit with a valid region.",
  "status" : "INVALID_ARGUMENT"
}
        at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:417)
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1132)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:515)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:448)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
        at org.apache.beam.runners.dataflow.DataflowClient.createJob(DataflowClient.java:61)
        at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:960)
        ... 10 more

I don’t know if I’m reading the CLI code on GitHub, but it looks like the BQ loader’s CLI options have been mostly removed from version 0.1.0 of old. BigQuery Loader worked fine for us before hand, but if this newer version is more stable and better supported, we’d prefer to use this instead.

Hi @robkingston, BigQuery Loader (and Forwarder) are Dataflow jobs and Dataflow regional endpoints are not supported in all GCP regions. (Full details: https://cloud.google.com/dataflow/docs/concepts/regional-endpoints.)

It is possible to run the Dataflow job in a region that is different than the one where your BigQuery table is. But this is not new in version 0.2.0. It was also the case in 0.1.0. Can you check what region your current version of Loader is running in?

1 Like

Thanks @dilyan switching to us-central1-a fixed it!

I think I’ve encountered this issue before. I was running it from australia-southeast1-a but perhaps my region wasn’t been used in the earlier version? Hopefully others will find this useful if they get this issue.

Dataflow has a bit of a weird issue where the metadata for jobs has to be in a region with a supported endpoint but it’s possible to run the actual compute in almost every geography. For example we run Dataflow in australia-southeast (the compute VMs) whereas the metadata is stored in asia-northeast1.

GCP highlights this in Dataflow job but it isn’t super clear.

The regional endpoint where metadata is stored and handled for this job. This may be distinct from the zone where a job’s workers are deployed

1 Like