GCP Dataflow template for automation using terraform

Hello Team,

I am trying to automate the creation of Dataflow jobs for the component BQloader and enricher using terraform in GCP, to do so I need dataflow job template created for each components.
Can you please let me know if you have these template already available for all the components , if not can you guide me through how to create these templates using the docker images.

The terraform resource I am planning for using is either a
1.dataflow jobs using template (if the templates are already available)
2. flex_template resource (This requires creation of flex template for which I need your guidance.)

resource “google_dataflow_job” “big_data_job” {
name = “dataflow-job”
template_gcs_path = “gs://my-bucket/templates/template_file”
temp_gcs_location = “gs://my-bucket/tmp_dir”
parameters = {
foo = “bar”
baz = “qux”
}
}

resource “google_dataflow_flex_template_job” “big_data_job” {
provider = google-beta
name = “dataflow-flextemplates-job”
container_spec_gcs_path = “gs://my-bucket/templates/template.json”
parameters = {
inputSubscription = “messages”
}
}



Please let me know if there is some alternate way to automate the creation of GCP dataflow jobs using terraform.

Hello @Srashti ,

For Enrich, please take a look at our terraform module which deploys the Snowplow Enrich PubSub service on Compute Engine. We don’t offer a similar module for Beam Enrich, as we plan to deprecate it soon.

For BQ Loader, there is no terraform module to share at the moment however we’ll be releasing a streaming BQ Loader soon and there will be an open source terraform module for that.

Please let us know if you have further questions.

Regards.

Thank you so much for your response.
We are currently creating dataflow jobs for enricher, bqloader and gcsloader.
We are using below command for manual setup something like below:
docker run
snowplow/snowplow-bigquery-loader:0.6.4
–runner=DataFlowRunner
–jobName=jobname
–project=projectid
–streaming=true
–region=region
–workerZone=zone
–gcpTempLocation=bucket_path
–config=configfile
–resolver=configfile
–serviceAccount=serviceaccount

I want to automate this using terraform. To use terraform dataflow module directly we donot have the flex templates. Can you please guide if there is an alternate way to create these dataflow job using terraform

Terraform supports Dataflow flex templates, but they aren’t currently supported in the official Snowplow Terraform modules. I can’t confirm if Snowplow will add support for Flex but given that enrich is moving away from Dataflow entirely I don’t think it’ll be a focus.

Thank you for letting me know about the plans to deprecate beam enrich. I will give it a try for Enrich Pub sub for enricher component.

Apart from this for the other two components i.e. BQloader and GCSLoader. For these two components also we are planning for dataflow jobs. Are there any reference available for these components flex templates?

And can you also suggest which GCP managed service I can use for implementing BQloader and GCSLoader.

I don’t think there’s a Terraform template available for this yet but typically you would run both of these services on Dataflow (streaming).