Replacement for Dataflow runner?

Just wondering if anyone have implemented the shredder service without dataflow runner.

Or I’m looking for alternative to dataflow runner.

example/code/snippet would appreciated, thanks

Hey @pramod.niralakeri,

I don’t have any working examples of running Shredder without Dataflow Runner, but surely it’s not a hard dependency. If I wanted to ditch Dataflow Runner, I’d go after a simple Python boto3 script, launching EMR, something like this gist (haven’t tested it, you need to replace all placeholders).

Dataflow Runner gives you few advantages:

  • Auto-formatting of dates, like run=2022-01-08-23-30-00
  • Base64-encoding
  • Config in plain JSON
  • Locks

But surely, with boto script you’d have more flexibility.

1 Like

I’m trying to run away application/services which require AWS Keys. Unfortunately I can provide them to snowplow shredder application/repo.

Not sure why is this very tightly coupled? where as other services like collector, enrich, S3 load sink don’t require.