My ETL was not running for several days and it seems like a lot of raw events was collected. Usually, ETL is done once per day and it takes about 40 minutes. I try to run ETL to process all incoming events and it tooks now about 9 hours and does not finish even the first step (enrich). Is it possible pass raw events to ETL process from incoming S3 directory by date only? I want to process all existing events into several runs, just processing one day per one run.
@sphinks, unfortunately, there is no automated way in place to break up the payload. Typically, when this occurs, we would bump the EMR cluster to be able to process the accumulated payload. If you want to break the payload down into chunks you would have to do it manually.