Happy weekend everyone!
I have reached the last stage of setting up Snowplow for the first time and got the EMR cluster with Snowflake transformer & loader running via dataflow-runner. However, the transformer job failed after less than half a minute complaining about a bucket being in the wrong region:
User class threw exception: shadeaws.services.s3.model.AmazonS3Exception: The bucket is in this region: us-west-2. Please use this region to retry the request (Service: Amazon S3; Status Code: 301; Error Code: PermanentRedirect; Request ID: ED792079BBB7BBF2; S3 Extended Request ID: Er/uKjOOYopiHZ0eoY4n8XCU7gPm4Ww1QVgyKVfkinKnJFcZhP17KbNlLMQotUbB+eiNj23ExC4=), S3 Extended Request ID: Er/uKjOOYopiHZ0eoY4n8XCU7gPm4Ww1QVgyKVfkinKnJFcZhP17KbNlLMQotUbB+eiNj23ExC4=
I suspect that this could be about our Snowflake warehouse being located in us-west-2 and the Snowplow pipeline being in us-west-1. But I don’t understand which bucket it’s trying to access in us-west-2, especially not in the transformer step. The enriched bucket, bucket for transformer output, and bucket for ETL logs are all in us-west-1. The only other bucket that I suspect could be involved is the snowplow hosted assets one to get the transformer and loader jars.
Short of setting up the pipeline again in us-west-2 (which I might want to do anyways just to have them in the same region), I don’t really know what to do here. Has anyone seen this before?