How to set HDFS replication factor

Hi Team,

Today our ETL job got failed, I have opened a case with AWS support to know the issue. The job got failed because of the missing blocks on HDFS caused by single node failure as the replication factor is 1 by default. So the question here is how to set the replication factor to 2 in the cluster.

Appreciate your help.

Regards!
Deepak Bhatt

By default EMR for a cluster with less than four nodes Amazon will set the HDFS replication factor to 1. You can increase your cluster size beyond 4 nodes and this should set the replication factor to 4.

Otherwise if you need to increase this I think it might be possible to patch the application configuration to increase that property in the emr-etl-runner here. I haven’t tried this so I can’t confirm if that’ll work but it may be worth testing.