I have faced an issue with running enrich process on AWS. It was working great for almost a year, but for some reason (our traffic is growing + number of events is growing also) I get a failed pipeline. In logs I have found root cause of failed enrich process:
Container killed by YARN for exceeding memory limits. (5.6Gb of 5.5Gb)
I was using 4 core m4.large instances + 1 master node with the same type m4.large.
I was looking for possible solutions of that issue and what I found is enable option spark.yarn.executor.memoryOverhead: “true” and advice to increase number of instances + improve types. I have switch to 6 m4.2xlarge nodes and rerun it. Now I get an error:
am container exited with exitcode 137, looker deeper in logs I managed to find out explain of that issue:
java.lang.OutOfMemoryError: Java heap space.
So every time I’m facing out of memory issue. I have check amount of files that placed in my enrich input directory and they are about 1.8Gb in total. Any advise how can I clarify the rootcause and fix it? Should I try increase number of node?
Update: I have try custom config recomendation mentioned here, also try to use 6 r4.2xlarge instances and still no luck. My monitoring tab on that cluster looks like so:
Still looking for any help on this issue.