If you want to catch failures in EmrEtlRunner/StorageLoader when they occur, safest is to wrap the execution of both apps in a monitoring script.
For example, if you are running it in cron, then cronic is a pretty good monitoring wrapper. At the other end of the scale, if you are using something enterprise-y like Chronos on Mesos, that will have failure notification built-in.
Because a lot of the jobflow of EmrEtlRunner/StorageLoader doesn’t (currently) take place in EMR, it’s really important to capture the full stdout/stderr from a failed run, so you know precisely where to restart the failure from. Without that output, you often have to do some detective work to figure out where to resume from (“I can see data in Redshift but some data still in shredded/good, so presumably the archive of shredded events failed partway through?”).