I’m currently using the snowplow docker containers for the collector, enrich, and storage (to s3) steps. Every piece of documentation I can find on RDB Loader indicates I need to run it with EmrEtlRunner. Since I used the docker containers for the others steps I never had to set that up, and don’t quite know how that fits in. I’ve also seen talk that EmrEtlRunner is being replaced with the DataFlow Runner… so I’m a little confused at the current best approach for this.
EmrEtlRunner is indeed still the only one recommended way to run RDB Loader today. We still have plans to deprecate it in favor of dataflow runner, but these plans unfortunately are without even approximate ETA.
It is possible to run it on non-EMR environment, but in the end you still will need to configure EmrEtlRunner to run RDB Shredder, which is required step for using RDB Loader.
So, unless you want to dive very deep into custom solutions - I’d recommend you to stick with EmrEtlRunner, especially that after R102 (I recommend to use R104) it goes with a lot Stream Enrich related goodness.
Thank you! This is exactly what I was looking for and couldn’t for the life of me find.