Does Dataflow Runner replace EmrEtlRunner

sevenm · April 3, 2017, 2:42pm

I am reading up about DataFlow Runner but can’t seem to grok does it completely replace EmrEtlRunner or is it complimentary. If it replaces it what is the workflow to then use storage loader to get the data into Postgres or Redshift? Do you then need the DataFlow iglu schema plus the old config.yml?

Thanks

BenFradet · April 3, 2017, 3:38pm

As stated in the RFC, the goal in the long run is to have two components:

dataflow runner which will actually spin up a cluster and run the pipeline
snowplow ctl which will be in charge of generating configuration files ready to be fed to dataflow runner from “the old config.yml”

sevenm · April 4, 2017, 2:59am

So both dataflow runner and snowplow ctl will replace the EmrEtlRunner and storage-loader?
Sorry to be dumb here.

BenFradet · April 4, 2017, 8:31am

Indeed, Storage Loader will be turned into an application that will be part of the EMR jobflow (the one ran by Dataflow Runner).

Sorry if this wasn’t made clearer. Anyway, this is still a bit far off into the future, everything will be specified in due time .

bryce · August 15, 2017, 8:25pm

Hi @BenFradet, for enriching and loading events to Redshift, is Dataflow Runner now the recommended approach? Or is using EmrEtlRunner + Storage Loader still the way to do it?

anton · August 16, 2017, 6:43am

Hi @bryce,

EmrEtlRunner still is the way to go. But we deprecated StorageLoader in latest R90 release. @BenFradet’s upcoming R91 release will include new generate command which should alleviate transition, but I believe EmrEtlRunner will remain default approach even after that for some more time.

bryce · August 16, 2017, 3:01pm

Thanks @anton! That’s kind of what I thought after finding and reviewing the recent release notes.

Topic		Replies	Views
Dataflow Runner setup For engineers	3	810	February 11, 2022
Most up-to-date approach to running RDBLoader Storage targets	2	1094	June 12, 2018
Snowplow RDB Loader R35 relased New releases	0	1225	January 27, 2021
Unable to configure the storage loader for postgresql For engineers	2	615	August 6, 2018
EmrEtlRunner sink Shredded data into S3 bucket For engineers	0	602	November 11, 2019

Does Dataflow Runner replace EmrEtlRunner

Related Topics