GCSLoader Dataflow Job error

As per the documentation, I am trying to use the gcloud command to setup the GCSLoader dataflow job, I am using the below command:

gcloud dataflow jobs run snowplowdemo-collectorcloudstorageloader-job1
–gcs-location gs://sp-hosted-assets/4-storage/snowplow-google-cloud-storage-loader/0.3.1/SnowplowGoogleCloudStorageLoaderTemplate-0.3.1
–project=project_name
–region=us-east1
–worker-zone=us-east1-a
–num-workers=1
–service-account-email=service_account_email
–parameters
inputSubscription=subscription_name ,
outputDirectory=gs://bucket_name/BadRecords/,
outputFilenamePrefix=output,
shardTemplate=-W-P-SSSSS-of-NNNNN,
outputFilenameSuffix=.txt,
windowDuration=5,
compression=none,
numShards=1

The dataflow goes in the failed state with the error:
The supplied parameters for autoscaling and the worker pool size are incorrect. Causes: Streaming autoscaling requires maxNumWorkers to be set.

Can you please suggest on the additional parameter that needs to be set.

It’s there in the error message isn’t it?

1 Like


This is the error message , I tried setting the parameter --num-workers=5 , even though same error.
And I also tried parameter --maxNumWorkers=5 but I get the error - “unrecognized argument , do you mean numWorkers”

Can you suggest what am I missing ?

Can you paste the full command you are running? numWorkers will set the initial number of workers whereas maxNumWorkers will set the ceiling for the maximum number when autoscaling.

Hi Mike,

Thank you for your prompt reply. I got this working, if I am setting the value for maxNumWorkers then it works fine without issue.
I thought this was an optional parameter, but I have observed that if I am not passing maxNumWorkers it failed.

Thank you so much for your prompt help. You can close this topic

1 Like