While we will continue to evaluate and address any reported security vulnerabilities for a further 6 months, we will no longer add new features or fix new bugs. If you encounter issues with EmrEtlRunner you should move to the new RDB Loader estate as detailed below.
EmrEtlRunner has been around since the very early days of Snowplow. It was used in older versions of the pipeline to coordinate a AWS EMR batch job that copied events around in S3, enriched the events, shredded events, and loaded events into Redshift.
The new RDB shredder runs in EMR using a very simple 2-stage EMR job, that copies data in S3 data and shreds it. We recommend using Dataflow Runner to coordinate the EMR job, and we have an example playbook on our docs site. The new RDB loader runs completely outside of EMR as a standalone application.
We now have complete confidence that the new architecture of shredder/loader is production-ready, and better than anything we had before. Shredding and loading now run in parallel, and shredding can continue even when the warehouse is unavailable. Furthermore, we added loads of helpful new features to the standalone loader, such as folder monitoring and runtime metrics
All previous versions of EmrEtlRunner will still be available on the Github releases page, so your pipeline will continue to work.
We recommend the upgrade guides on the Snowplow docs site to help you migrate to the newer architecture.