Duplicate events, using event_id as partition_key

Hi @nacivida,

We have a relatively sophisticated event deduplication engine powered by Spark and DynamoDB which is built into our RDB Loader:

At the moment this only works for Postgres and Redshift, and we have not yet extracted this capability out into its own Scala library that can be called from elsewhere.

Separately, we have started work on our port of Snowplow to GCP, which will include loading of BigQuery:

Over time, our BigQuery loader should include an equivalent deduplication engine, although in this case most likely backed by Cloud Bigtable, not DynamoDB. However, this is still some way off.