We have improved a little bit our batch-pipeline peformance this way.
If for any reason the server running the EmrEtlRunner pipeline is in a different region than the "archive_enrich" step (Step 12 in the image) then the performance could be impacted, even if the source (
:good) and target (
:archive) buckets are in the same region.
Here below you can see what was our performance when using 2 different regions and when using only one.
If your collectors are in different regions, logging into S3, we recommend to enable the S3 Cross-Region replication to sync the files into your