Reading raw events from kinesis-s3 log

Hi @vshulga,

You are right in your understanding of the event format for the files produced by Kinesis S3. More details could be found here: https://github.com/snowplow/snowplow/wiki/Collector-logging-formats#the-snowplow-thrift-raw-event-format

As for the event processing, you would probably want to read the enriched data, not raw. Our Analytics SDKs are designed to do just that.

Thus, you would probably want to have the data enriched first before applying analytics.

You might be interested in considering an implementation (or rather a part of) as depicted in the Lambda architecture: How to setup a Lambda architecture for Snowplow. You could deploy EmrEtlRunner and run it with --skip shred option to produce just the enriched events.

2 Likes