We are working on upgrading our snowplow RT events pipeline and are looking to introduce better engineering standards to our deployment.
We have a standard pipeline:
RDB loader →
We have the same deployment across 3 environments DEV, STAGING and PROD
We have 2 AWS accounts (effectively 2 VPCs) one for DEV + STAGING and another for PROD
For deployment and testing we would like to replicate PROD events to the DEV + STAGING env.
What would you think is the best option for this:
- Replicate the
collector-good-stream-dev? tried using flink application but all the production events went to the
- Read the raw events from
collector-good-bucket-prod(how?) and use those to replay / replicate across accounts?
- Or maybe there’s a third solution which is better than the above suggestions
Our goal here is for the
staging env. to have the production events and testing events - the difference from the
prod env. is to keep the data for a much shorter time (e.g. 2 weeks) but have a 'feel of the pipeline in
staging before we roll the upgrade to