java.nio.file.FileAlreadyExistsException: ./ip_geo

Yes this is a new cluster for each new batch/recovery. We’re using the snowplow-emr-etl-runner 0.29.0 with yarn.resourcemanager.am.max-attempts: "1" so it shouldn’t exist on start up and I hopefully nothing is retrying the containers.

Presumably it is harmless? I think our problem is more related to the ETL very very slow in larger batches, but might be a bit of a mystery.