Reading raw events from kinesis-s3 log

ihor · May 24, 2017, 12:21am

You are right in your understanding of the event format for the files produced by Kinesis S3. More details could be found here: https://github.com/snowplow/snowplow/wiki/Collector-logging-formats#the-snowplow-thrift-raw-event-format

As for the event processing, you would probably want to read the enriched data, not raw. Our Analytics SDKs are designed to do just that.

Scala SDK: https://github.com/snowplow/snowplow-scala-analytics-sdk/
Python SDK: https://github.com/snowplow/snowplow-python-analytics-sdk

Thus, you would probably want to have the data enriched first before applying analytics.

You might be interested in considering an implementation (or rather a part of) as depicted in the Lambda architecture: How to setup a Lambda architecture for Snowplow. You could deploy EmrEtlRunner and run it with --skip shred option to produce just the enriched events.

Topic		Replies	Views
Golang Kinesis Reader For engineers	5	1239	March 10, 2018
Collector is sending empty raw events Collectors	3	2151	August 22, 2018
Only using collectors without enrichers For engineers	3	676	July 9, 2019
Reading data from raw stream to both batch and real time For engineers	2	765	November 27, 2017
Thrift Parsing Format For engineers	2	697	September 7, 2019

Reading raw events from kinesis-s3 log

Related Topics