New Relic charts from our test runs yesterday:
First two "spikes" (~10% CPU 500MB RAM), lasting a few minutes each are r77 Great Auk EmrEtlRunner moving Clojure collector logs and processing them.
The second two spikes (100% CPU, 1100MB RAM), lasting 2.5 hours each is an old 0.9.x build with JRuby support backported processing CloudFront collector logs.
There are a lot more CloudFront collector logs so that plays a role, but we don't have the same problem when running the old build with MRI Ruby.
The diff is dead simple too: https://github.com/sspinc/snowplow/pull/5
I'm still suspecting MRI vs JRuby with the Gemfile bundle potentially playing a role. I will now test the same codebase with MRI to see if that is fast.