Hi there,
We are longtime Snowplow users and have been successfully running the batch enrichment process for a long time. Just over a year ago we moved to Snowflake as our primary data warehouse. We have had the EMR ETL Runner (v0.34.0) and the Dataflow Runer (v0.4.1) running in production successfully since their release.
In the last month we have started getting transient failures provisioning EMR clusters in AWS with an “Internal Error” as the only message. There are no logs, none of the Snowplow steps are ever added to the cluster. It happens with both ETL runner and dataflow runner. AWS support’s only suggestion was to bump the EMR version to the latest. We are currently on:
Release label:emr-5.9.0
Hadoop distribution:Amazon 2.7.3
Applications:Spark 2.2.0
Are there any compatibility issues with newer EMR versions or can we safely bump up to 5.32.0 without upgrading our other Snowplow components?
Thanks for reading!
Brandon