We are in the process of trying out the Spark workflow and would like some feedback on our setup.
I have read the post from Rick at OneSpot on tuning the spark settings, it does seem to reach full CPU, but not sure on the node usage being reported by spark
We have a 6 x r3.8xlarge cluster running, and the resource manager is showing
My concern is the vCores used vs vCores total, seems we aren;t using all, but we are using all memory and I am seeing ~100% cpu usage in EC2 monitoring.
The batch is 1 day, approx 100gb and 20gb bad (dont ask ;-( )