BigQuery Loader - Set job-name

How is it possible to change the dataflow job name of the BigQuery loader 0.4.0? I’ve used the same syntax to run my beam-enrich pipeline but it does not work, while my beam-enrich job name does work. This is important to me to separate my development and production pipeline.

./snowplow-bigquery-loader-${var.bigquery_version}/bin/snowplow-bigquery-loader  --config=$(cat bigquery_config.json | base64 -w 0)  --runner=DataFlowRunner --job-name=bigquery-loader --project=${var.project} --region=${var.region} --gcpTempLocation=gs://${var.project}/temp-files --resolver=$(cat iglu_resolver.json | base64 -w 0)

Hi @sdbeuf. In your config file you can specify a name for the job, using the name parameter.

Also, you can add labels to help you differentiate between jobs, eg:

--labels:'{"environment":"prod"}'
3 Likes

Hi @dilyan

The parameter name, nested into data was already set. Currently, it always assigns main-root-… as the name.

I was digging a little deeper into the docs and I used the example in the snippet (https://docs.snowplowanalytics.com/docs/setup-snowplow-on-gcp/setup-bigquery-destination/bigquery-loader-0-4-0/). When I clicked through to https://github.com/snowplow/iglu-central/tree/master/schemas/com.snowplowanalytics.snowplow.storage/bigquery_config/jsonschema I saw that the fieldnames used are different. Which should I use?

Hi @sdbeuf, apologies, I misunderstood your question.

As well as labels that you can pass to the Dataflow job, you can also pass the --jobName=myJobName parameter when launching the job. This will create a job called myJobName.

As for the links you quote, the second one is the schema describing what the config file should look like, and the snippet in the docs is an example of a config that complies with that schema.

Hi @dilyan

That did the trick thanks, for some reason in the beam-enrich --job-name works but in the loader not. Changed it in the loader --jobName and that works.

Thanks

Maybe not an answer to the question, but also ran into this issue.

Checking out Google Dataflow documentation I noticed a slight difference in syntax between Java and Python SDKs

For Python the job name label is job-name while Java it is jobName. I guess the bq-loader is implemented using the Java SDK which would make sense since the loader is a Scala-based application. https://github.com/snowplow-incubator/snowplow-bigquery-loader

See section Setting other Cloud Dataflow pipeline options
https://cloud.google.com/dataflow/docs/guides/specifying-exec-params#java:-sdk-2.x_11

1 Like