I also think an hour or so should be fine. You can speed it up but using faster instances but this will of course increase cost. It all depends on how fast you want to process the data. If you want to run it hourly I would recommend trying to get EMR under half an hour to allow for the data load.
Another thing to consider is that AWS charges by the hour. So a slower instance is cheaper per hour but if it needs e.g. one hour and ten minutes to complete you’re still paying for two hours.
It requires a bit of trial and error in the beginning and it’s very difficult to accurately predict a run time depending on the number of events. While a larger number of events will take a longer time the custom events can have quite some influence on the time the Shredding step takes.
In regards to your question about the separate buckets, is this what you mean:
do not put your raw:processing inside your raw:in bucket, or your enriched:good inside your raw:processing, or you will create circular references which EmrEtlRunner cannot resolve when moving files.