In order to load enriched data into a Postgres/Redshift, data needs to be shredded first (prepared to be loaded into this specific storage target). This is done by a component called RDB Shredder, orchestrated by EmrEtlRunner. Here’s a diagram of the process: https://github.com/snowplow/snowplow/wiki/Batch-pipeline-steps
I think what should work for you is following process:
- Copy all enriched folders you want to load from
- Run EmrEtlRunner, starting from RDB Shredder step
Bear in mind however that Postgres support is very-very limited and considered experimental at the moment. It does not support any shredded entities (e.g. contexts and self-describing events), but only atomic data. It also uses fairly inefficient process to load data with copying it to a local machine first, so you need to make sure none of your enriched folders exceed EMR master’s free disk space. I’d restrain from using PostgreSQL at the moment and go with Redshift instead.
Good news though is that we have proper Postgres support on mind. No ETA yet, but my hope is that it will be available in 2019.