AWS Athena as an alternative data store

brandonmp · January 11, 2017, 4:41pm

My questions are:

(a) how, and at what point in the pipeline, should I convert enriched Snowplow logs to ORC?

and (b) would anyone recommend special processing steps for depositing enriched event data into S3 for consumption by AWS Athena? Similar to Redshift/Postgres storage, I assume I’ll have to setup tables, but beyond that I’m unclear.

I’d like to setup Snowplow (read: I’m brand new) such that the event data land in an s3 bucket (ideally in ORC file format, as that seems to be the most performant), for consumption via AWS Athena (instead of Postgres, Redshift, et al.).

I’m working through the Snowplow docs, but haven’t been able to determine whether or not this is a reasonably simple departure from normal operations. Right now I have the Cloudfront collector set-up, but no ETL.

There’s some discussion suggesting that Athena ought to work with Snowplow, but I’ve found nothing about using it as a data store & it’s not clear to me how the ETL process has to differ (if at all) from standard enrichment.

Topic		Replies	Views
Using AWS Athena and AWS Glue with Snowplow data For data modelers & consumers	0	1165	March 22, 2019
Minimal Enrich Setup? Enrichment	4	2726	June 29, 2017
Aws quickstart optimized snowplow infra For engineers	3	625	January 30, 2023
Approaches to access data in S3 For data modelers & consumers	2	1395	May 18, 2021
Convert Snowplow thrift files (on S3) to parquet For engineers	2	1902	February 25, 2019

AWS Athena as an alternative data store

Related Topics