I have a file that is 28GB big and has 23million events in it. I’m not sure if the best way would be to import the complete file and the EMR splits it up internally or if I should create chunks of let’s say 30k per file and import it.
Does anyone has experience what could make the least problems?
Does anyone knows the instances that can handle that workload? I thought about 4-5 m4.4xlarge instances