We are using snowplow scala collector to collect events. Its a standard collection pipeline - collector sinks events in Kinesis, kinesis-s3 consumes from Kinesis and writes these events to S3.
Our intention is to use PrestoDB to analyze the S3 files. We’d like to convert these thrift files to parquet format, since parquet supposedly performs better. Any suggestion on how do we go about that? Also, is it possible to dump the events to S3 directly in parquet format?