Snowplow System Columns

Hi All, i have the following set up: Application --> Snowplow scala Collector --> RAW Stream (Kinesis Stream) --> Stream Enrich --> Good events and Bad events (Kinesis streams).

The snowplow scala collector is receiving data from multiple sources (say android app, web url, iphone etc). We see that for these different sources, the number of system columns that are being sent by snowplow is varying. For example,

From android app, we get this (below)
“android mob 2018-07-31 04:50:24.159 2018-07-31 04:50:23.514 2018-07-31 04:50:01.243 unstruct c8e4596a-6039-4701-8a6e-c1eafa09d1a6 abcd andr-0.6.2 ssc-0.13.0-kinesis stream-enrich-0.16.1-common-0.32.0 103.81.237.x 7215a5c4-8b47-4328-8150-8cd29bca539a IN TN Chennai 600126 13.0833 80.2833 Tamil Nadu”

And from Web, we get this (below)
"cfe23a web 2018-07-31 04:56:03.225 2018-07-31 04:56:01.285 2018-07-31 04:56:01.147 unstruct 139d26b2-62d1-4877-91a9-41e3a4186996 cf js-2.9.0 ssc-0.13.0-kinesis stream-enrich-0.16.1-common-0.32.0 103.81.237.x 21095b058ef110f911370ccb5c817dac883ca0b9b0c9088effb1fafb0a929b6f e692049e-7b3f-4748-85cd-75a8d7f470f5 1 35a5ae42-6a7e-4fa6-b27f-ed1866fdc5ab IN TN Chennai 600126 13.0833 80.2833 Tamil Nadu file:///D:/mov_bbb/video4.html file 80 /D:/mov_bbb/video4.html"

As you can see in web source, we are getting 7 extra columns. We assume that this would vary for various other sources as well.

Is there a way in snowplow, where we can restrict the number of system generated columns that are being sent down into kinesis streams for processing, so that we only have a fixed number of columns for every type of source.

Kindly help me on this.

Regards,
Naren

For events in the enriched stream you should receive an event with approximately 130 columns that are tab delimited. This will be consistent regardless of the source of the data (web, mobile, app etc) though it may be empty (or null) for some of these columns if the data is not present.

Hi Mike, do you know where I can find the order to those columns?

For the order, your best bet is to look through one of the Analytics SDK source: snowplow-scala-analytics-sdk/Event.scala at 5dc24d98b7ce48be827f415cbdba05c2c6989e82 · snowplow/snowplow-scala-analytics-sdk · GitHub

However, whatever it is that you’re doing I’d suggest parsing the enriched event with one of the analytics sdks: Analytics SDK - Snowplow Docs