403 on hosted assets bucket


#1

Hi,

Lately (maybe 2 months) we’ve been getting occasional 403s on the hosted assets bucket on eu-central-1:

Data loading error [Amazon](500310) Invalid operation: Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 66EEE66DFADD2AF7,ExtRid W+J/VaXJ+7rLQbeLNIUmwTED/sVh1dKjyIpH1xS/mO95pAUhgGoNknR/d+hVOnr/uLlos5rY7rs=,CanRetry 1
Details: 
 -----------------------------------------------
  error:  Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 66EEE66DFADD2AF7,ExtRid W+J/VaXJ+7rLQbeLNIUmwTED/sVh1dKjyIpH1xS/mO95pAUhgGoNknR/d+hVOnr/uLlos5rY7rs=,CanRetry 1
  code:      8001
  context:   s3://snowplow-hosted-assets-eu-central-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/duplicate_1.json
  query:     445747
  location:  s3_utility.cpp:284
  process:   padbmaster [pid=15962]
  -----------------------------------------------;
ERROR: Data loading error [Amazon](500310) Invalid operation: Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 66EEE66DFADD2AF7,ExtRid W+J/VaXJ+7rLQbeLNIUmwTED/sVh1dKjyIpH1xS/mO95pAUhgGoNknR/d+hVOnr/uLlos5rY7rs=,CanRetry 1
Details: 
 -----------------------------------------------
  error:  Problem reading manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 66EEE66DFADD2AF7,ExtRid W+J/VaXJ+7rLQbeLNIUmwTED/sVh1dKjyIpH1xS/mO95pAUhgGoNknR/d+hVOnr/uLlos5rY7rs=,CanRetry 1
  code:      8001
  context:   s3://snowplow-hosted-assets-eu-central-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/duplicate_1.json
  query:     445747
  location:  s3_utility.cpp:284
  process:   padbmaster [pid=15962]
  -----------------------------------------------;
Following steps completed: [Discover]
INFO: Logs successfully dumped to S3 [s3://XXXXXXXXXXX/log/rdb-loader/2018-04-10-22-01-02/a5430d55-3546-484a-9ee0-c63357540df3]

I’ve tried a few things to work around this issue, adding that bucket to our IAM configuration as suggested in another thread (didn’t fix it, still happens sometimes) and now I’ve mirrored the entire assets bucket and set it in config.yml at aws.s3.buckets.assets. Strangely enough, it still uses the official hosted assets bucket.

I’m not sure what else to do hence this post.


#2

I hadn’t realised that the issue was in the redshift cluster, adding that bucket does to its IAM policy role did fix it for now but I’ll try and keep an eye and see if it surfaces again.

It would be nice if the RS queries also used the bucket that’s configured in the yml though.