Hi @vshulga,
You are right in your understanding of the event format for the files produced by Kinesis S3. More details could be found here: https://github.com/snowplow/snowplow/wiki/Collector-logging-formats#the-snowplow-thrift-raw-event-format
As for the event processing, you would probably want to read the enriched data, not raw. Our Analytics SDKs are designed to do just that.
- Scala SDK: https://github.com/snowplow/snowplow-scala-analytics-sdk/
- Python SDK: https://github.com/snowplow/snowplow-python-analytics-sdk
Thus, you would probably want to have the data enriched first before applying analytics.
You might be interested in considering an implementation (or rather a part of) as depicted in the Lambda architecture: How to setup a Lambda architecture for Snowplow. You could deploy EmrEtlRunner and run it with --skip shred
option to produce just the enriched events.