Storage loader code from where it gets data from shredded/good


#1

in my prod shredded/good bucket I have shredded events. These events getting copied by nonprod emr job. so all these objects have nonprod access. So while running storage loader it is failing to access these events.
Can anybody please tell me the exact code of storage loader from where storage its fetching events from shredded/good directory?


#2

Hi @ganesh,

Could you clarify a question? Especially, what “nonprod emr job” is for. StorageLoader (which is discounted app, replaced by RDB Loader in R90+) takes shredded events from shredded.good bucket, which is set in your config.yml.


#3

Hi Anton! Sure.

Currently I am working with 2 AWS accounts. prod and non-prod
We have S3 bucket and redshift is in prod account
and emr is running in non-prod account.

When I start emr job in non-prod it is copying data from raw/in to different directories like raw/processing, enriched/good, shredded/good likewise. As this emr is running from non prod, only non-prod account has access to these objects.

When EMR starts RDB loader, it gives exception that s3 access denied to objects.
These objects in prod s3 have access to only non-prod account.

So What I need is to setobjectacl to these objects in shredded good before loading them in Redshift