I am trying to create a more real-time data loading from S3 to Redshift. I read here that snowplow EmrEtlRunner might be able to do this using persistent job flow option.
May I know how to use this option? Do I need to start an EMR cluster myself for EmrEtlRunner to look for the persistent cluster? Is the EmrEtlRunner binaries constantly runs when using this option? Do all the steps in the Batch Pipeline Steps are constantly being run every x minutes?
Kindly let me know if there is a guide to use this since I can’t seem to find more details about this.