Keeping the same EMR cluster for reuse


#1

We are new to Snowplow and EMR on AWS. The “problem” that we see is that whenever we run the EmrEtlRunner it creates a new EMR cluster, which takes ages to set up and then terminate. Otherwise, the steps do not take long.

Is there a way to set up Snowplow to reuse an EMR cluster, hence avoiding starting and terminating a different EMR cluster every time we run EmrEtlRunner?


#2

@pmatsinopoulos, yes, you can run a permanent cluster and add tasks to execute on it with Dataflow Runner.