I have the entire snowplow pipeline configured and operational and I am now experimenting with optimal scheduling for the EMR process and storage loader. There aren't really any guidelines on how often to run these processes except for some anecdotal recommendations to try once a day and see what works. I'm doing that but when the job runs it takes a very long time to process all the events, sometimes as long as 1 to 2 days. I have a two part question, the first part is should I deviate from the default ec2 instance type (m1.medium) for the EMR job, and possibly use an ec2 type with a little more CPU power, or is the issue more related to I/O? Secondly, if I can speed up this process (or even if i can't), would it make sense to run this job more frequently? I am using Jenkins which will allow me to ensure only one job will run at a time, so perhaps it would make more sense to attempt the EMR job once every hour or two? I am running the tracker on a very high traffic site so naturally it generates a lot of events. Thanks in advance for any advice.