Beam-enrich feature request - dataflow templated jobs

Hi all,

More recently, the Beam SDKs have supported a templating method allowing dynamic runtime parameters to be passed directly to a Beam job.
I think it would be ‘neato’ if it were possible to compile to a Dataflow template, perhaps in addition to the existing Docker/binary options.
Although not a massive change, the advantages that I can see are:

  • No requirement to initiate the pipeline via docker/pod or an environment specifically setup for initiating beam pipelines (jdk8 :frowning:) - e.g. managing jobs via CLI/gcloud directly rather than indirectly via a VM or container environment.
  • Easier upgrade path for users/ease of modification of parameters of an older pipeline.
  • Resource files for the job are pre-staged, so faster startup/shutdown times.

Let me know what you think.

Hi @rbkn , we’ve explored this idea and want to be able to deliver it at some point. Last time we checked, it was impossible because Scio, the library we use for the Dataflow job, did not expose the ValueProvider abstraction and so we couldn’t pass runtime parameters to the template. Scio has seen several releases since then and it was on their to-do list to fix this, so it’s definitely worthwhile revisiting this.

Yes, just noticed this.
Looks like it’s still open.