ETL very very slow in larger batches

@bhavin that’s interesting and probably the same problem I saw. This is the discourse post that discusses the Spark optimisation learnings which links to this where the spreadsheet you mention is. We haven’t tried to optimise our current cluster (master: 1xm1.medium, core: 2xm4.4xlarge) so it’s running with ‘out of the box’ AWS settings. Did you get any success with larger batches with an optimised cluster config?

2 Likes