I realize this is a question without a concrete answer, but any general guideline or anecdotal evidence from others who are ingesting large volumes of atomic.events into Redshift is greatly appreciated.
There was some doubt internally here that pulling all events into Redshift as atomic.events was a good idea, specifically whether we could do complicated useful queries with SQL over that large table. Is there a certain order of magnitude where it doesn’t make sense to have all the atomic events in Redshift anymore? 100K events per day, 1M events per day, 10M events, 100M, etc…?
If you do need aggregate tables, do most people write EMR jobs to generate them from the logs, or do you generate the aggregate tables right from Redshift using SQL queries (select into).