Currently I am looking in a way at preventing certain contexts to be loaded into Redshift. We will process the events in our realtime pipeline but it is not needed to be loaded into permanent storage.
Least wanted solution is to load it anyway and then delete after loading.
What I am thinking of now is to initially skip the RDB loader and archive steps in the EMR run. Then issuing an S3 command to remove the related folder/data from S3 and then to start a new EMR cluster and continue from the RDB load step.
Is there a better solution? Is it possible to add a step to the EMR flow? If yes: which steps/config do I need to alter.
Thanks in advance!