Hi all,
What is the best approach to read the good data in S3 from .tsv.gz?
Hi all,
What is the best approach to read the good data in S3 from .tsv.gz?
I would say that’s going to be pretty dependent on:
What would you like to do with the data?
How large are the keys?
How many keys are you planning on reading at a time?
How are the keys partitioned on S3?
My use will to data analysis, instead save on postgress use athena to do that.
The size, time to read I do not know yet because we Will start to track our application
There’s a guide here: Using AWS Athena to query the 'good' bucket on S3
It was written quite a while ago, but should get you most of the way if not all the way there.