[Solved] Output of Stream Enricher is not stable and does not have a fixed length

Hello Team,

I am new to Snowplow, and I have tried to implement a POC in a single machine. I have set up Snowplow collector with Kafka v1.0.1 from Bitnary and then the StreamEnricher v1.0.0 using the default configurations and default resolver.json. The pipeline seemed to work perfectly, but when I was trying to convert the TSV enriched events into JSON (as I want my events to be stored in PostgreSQL) I was not able to work with the enriched data. I tried Snowplow Analytics SDK but the tool was unable to parse my events and then I also tried to implement my own converter using the predefined events field as the header but the result was not correct. I believe there are some missing values in my enriched data which are supposed to be null/empty but in my case they are completely missing). As far as I know from the documentation, the enriched events are expected to have same order and format, but I noticed that in my case they resulted every time in different outputs (For example they had 90 tab separated values another time 120 - Please, refer to the samples below) Is this how is it supposed to work or am I missing something?


First event:

sample-app-https web 2021-02-08 17:46:09.249 2021-02-08 17:46:04.2162021-02-08 17:46:03.816 page_ping 3e2aeae6-997e-4e06-ad98-28a4154c60ac bc js-2.17.0 ssc-1.0.1-kafka stream-enrich-1.0.0-common-1.0.0 127.0.0.x 9d8043ec-7af3-4032-b5c0-b8bbc13695be 22 3041729e-f9d6-4457-8afa-08472823d900 https://name.blob.core.windows.net/$web/index.html Snowplow Sample Webapp https name.blob.core.windows.net 443 /$web/index.html {"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0","data":[{"schema":"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0","data":{"id":"0c5c4158-75fd-4f77-b2c1-00296aedb127"}},{"schema":"iglu:org.w3/PerformanceTiming/jsonschema/1-0-0","data":{"navigationStart":1612799751980,"unloadEventStart":1612799752114,"unloadEventEnd":1612799752114,"redirectStart":0,"redirectEnd":0,"fetchStart":1612799751984,"domainLookupStart":1612799752028,"domainLookupEnd":1612799752028,"connectStart":1612799752028,"connectEnd":1612799752074,"secureConnectionStart":1612799752041,"requestStart":1612799752075,"responseStart":1612799752100,"responseEnd":1612799752102,"domLoading":1612799752120,"domInteractive":1612799752162,"domContentLoadedEventStart":1612799752162,"domContentLoadedEventEnd":1612799752162,"domComplete":1612799752193,"loadEventStart":1612799752193,"loadEventEnd":1612799752193}}]} 00 0 0 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.x Safari/537.36 Edg/88.0.705.x en-US 1 0 0 0 0 0 0 0 0 124 1920 979 Europe/Berlin 1920 1080 windows-1252 1920 979 2021-02-08 17:46:03.822 {"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0","data":{"useragentFamily":"Chrome","useragentMajor":"88","useragentMinor":"0","useragentPatch":"4324","useragentVersion":"Chrome 88.0.4324","osFamily":"Windows","osMajor":"10","osMinor":null,"osPatch":null,"osPatchMinor":null,"osVersion":"Windows 10","deviceFamily":"Other"}}]} 90721053-fb4b-48eb-98af-846447892921 2021-02-08 17:46:04.210 com.snowplowanalytics.snowplow page_ping jsonschema 1-0-0 7a6e6032767b86db5f5e928baa6e42d1


Second event :

sample-app-https web 2021-02-09 08:49:31.722 2021-02-09 08:49:29.354 2021-02-09 08:49:29.002 page_ping 9cf44045-9dd3-4448-81c7-1a12af186da8 bc js-2.17.0 ssc-1.0.1-kafka stream-enrich-1.0.0-common-1.0.0 127.0.0.x 9d8043ec-7af3-4032-b5c0-b8bbc13695be 23 474cbc6f-2d8d-49cb-8b0d-de213ce7fe2c https://name.blob.core.windows.net/$web/index.html Snowplow Sample Webapp https name.blob.core.windows.net 443 /$web/index.html {"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0","data":[{"schema":"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0","data":{"id":"0c5c4158-75fd-4f77-b2c1-00296aedb127"}},{"schema":"iglu:org.w3/PerformanceTiming/jsonschema/1-0-0","data":{"navigationStart":1612799751980,"unloadEventStart":1612799752114,"unloadEventEnd":1612799752114,"redirectStart":0,"redirectEnd":0,"fetchStart":1612799751984,"domainLookupStart":1612799752028,"domainLookupEnd":1612799752028,"connectStart":1612799752028,"connectEnd":1612799752074,"secureConnectionStart":1612799752041,"requestStart":1612799752075,"responseStart":1612799752100,"responseEnd":1612799752102,"domLoading":1612799752120,"domInteractive":1612799752162,"domContentLoadedEventStart":1612799752162,"domContentLoadedEventEnd":1612799752162,"domComplete":1612799752193,"loadEventStart":1612799752193,"loadEventEnd":1612799752193}}]} 00 0 0 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.x Safari/537.36 Edg/88.0.705.x en-US10 0 0 0 0 0 0 0 1 24 1920 979 Europe/Berlin 1920 1080 windows-1252 1920 979 2021-02-09 08:49:29.005 {"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0","data":{"useragentFamily":"Chrome","useragentMajor":"88","useragentMinor":"0","useragentPatch":"4324","useragentVersion":"Chrome 88.0.4324","osFamily":"Windows","osMajor":"10","osMinor":null,"osPatch":null,"osPatchMinor":null,"osVersion":"Windows 10","deviceFamily":"Other"}}]} 9f6f461c-fd14-469d-a1ac-099bdb646c99 2021-02-09 08:49:29.351 com.snowplowanalytics.snowplow page_ping jsonschema 1-0-08526709d49fb16e402da1ed514f1d8ba


The first event has around 83 tab separated values while the second one has 90 TSV. Also, please notice the wrong output in the end of the second event: 1-0-08526709d49fb16e402da1ed514f1d8ba, the event version and the fingerprint are not tab separated. I have been reading the documentation and tried to find something which is causing this behaviour but I have been not sucessful so far.

Any help is appreciated, thank you!

Hi @Sonny ,

To insert enriched events to Postgres, you can use our Postgres loader.

I tried Snowplow Analytics SDK but the tool was unable to parse my events

Which SDK did you use ? Could you show the error that you’re getting please ?

As far as I know from the documentation, the enriched events are expected to have same order and format, but I noticed that in my case they resulted every time in different outputs

That’s correct. I’m afraid that your samples don’t contain the tabs, would you mind putting them again with the \t please ?

BTW we would recommend to use enrich 1.4.2, which is latest version. Installation intructions can be found on our docs website.

2 Likes

Hi @BenB, thanks for you reply.

I tried Postgres Loader, but it was only compatible with Kinesis and GC Pub, while I am using Kafka(I am not aware if Kafka is available with Postgres Loader, and I could only find the configuration for Kinesis in the official setup documentation and in the GitHub repo).

I used the Scala Analytics SDK in a Java Maven Project just for testing my output, following this link:
https://github.com/snowplow-incubator/snowplow-scala-analytics-sdk-java-example
I tried the analytics SDK with some examples I got from the discourse and it looks like everything is working fine for other events. When I try to parse my events it was just giving the output “Unable to parse”, which I believe is happening because of those missing values in the enriched data. Unfortunately, I am not getting any raw error, I would guess the format of the data is not valid. (Does the output of the enricher has a specific length in terms of TSV, meaning a fixed number of output values by default?)
Validated<ParsingError, Event> validatedEvent = Event.parse(record);
// Check if String has been parsed correctly
if (validatedEvent.isValid()) // This one is returning false :/


I am attaching the screenshots of the output below just in case.
The first one is for an event I found somewhere here in discourse. I used it as an example to test the SDK:

{"app_id":"test","platform":"web","etl_tstamp":"2019-10-30T17:30:34.302Z","collector_tstamp":"2019-10-30T17:30:29.169Z","dvce_created_tstamp":"2019-10-30T17:30:31.755Z","event":"page_view","event_id":"b173ab5d-54c7-4e31-8a25-d28c658ee73d","txn_id":null,"name_tracker":"cf","v_tracker":"js-2.11.0","v_collector":"ssc-0.16.0-kafka","v_etl":"stream-enrich-0.21.0-common-0.37.0","user_id":null,"user_ipaddress":"89.31.99.x","user_fingerprint":"1776775914","domain_userid":"e3201a9a-2baf-4a22-8f80-08f7b9538ecd","domain_sessionidx":8,"network_userid":"2c094c4f-d642-4f5e-8f78-93ba4a035a05","geo_country":"NL","geo_region":null,"geo_city":null,"geo_zipcode":null,"geo_latitude":52.3824,"geo_longitude":4.8995,"geo_region_name":null,"ip_isp":null,"ip_organization":null,"ip_domain":null,"ip_netspeed":null,"page_url":"https://example.com/test.php","page_title":"My Website with Snowplow Collector","page_referrer":null,"page_urlscheme":"https","page_urlhost":"example.com","page_urlport":443,"page_urlpath":"/test.php","page_urlquery":null,"page_urlfragment":null,"refr_urlscheme":null,"refr_urlhost":null,"refr_urlport":null,"refr_urlpath":null,"refr_urlquery":null,"refr_urlfragment":null,"refr_medium":null,"refr_source":null,"refr_term":null,"mkt_medium":null,"mkt_source":null,"mkt_term":null,"mkt_content":null,"mkt_campaign":null,"contexts":{},"se_category":null,"se_action":null,"se_label":null,"se_property":null,"se_value":null,"unstruct_event":null,"tr_orderid":null,"tr_affiliation":null,"tr_total":null,"tr_tax":null,"tr_shipping":null,"tr_city":null,"tr_state":null,"tr_country":null,"ti_orderid":null,"ti_sku":null,"ti_name":null,"ti_category":null,"ti_price":null,"ti_quantity":null,"pp_xoffset_min":null,"pp_xoffset_max":null,"pp_yoffset_min":null,"pp_yoffset_max":null,"useragent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.121 Safari/537.36 Vivaldi/2.8.1664.44","br_name":null,"br_family":null,"br_version":null,"br_type":null,"br_renderengine":null,"br_lang":"en-US","br_features_pdf":true,"br_features_flash":false,"br_features_java":false,"br_features_director":false,"br_features_quicktime":false,"br_features_realplayer":false,"br_features_windowsmedia":false,"br_features_gears":false,"br_features_silverlight":false,"br_cookies":true,"br_colordepth":"24","br_viewwidth":1600,"br_viewheight":409,"os_name":null,"os_family":null,"os_manufacturer":null,"os_timezone":"Europe/Berlin","dvce_type":null,"dvce_ismobile":null,"dvce_screenwidth":1280,"dvce_screenheight":720,"doc_charset":"UTF-8","doc_width":1600,"doc_height":409,"tr_currency":null,"tr_total_base":null,"tr_tax_base":null,"tr_shipping_base":null,"ti_currency":null,"ti_price_base":null,"base_currency":null,"geo_timezone":"Europe/Amsterdam","mkt_clickid":null,"mkt_network":null,"etl_tags":null,"dvce_sent_tstamp":"2019-10-30T17:30:31.759Z","refr_domain_userid":null,"refr_dvce_tstamp":null,"derived_contexts":{"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0","data":[{"schema":"iglu:nl.basjes/yauaa_context/jsonschema/1-0-0","data":{"deviceBrand":"Unknown","deviceName":"Desktop","layoutEngineNameVersion":"Blink 77.0","operatingSystemNameVersion":"Windows 7","layoutEngineNameVersionMajor":"Blink 77","operatingSystemName":"Windows NT","agentVersionMajor":"2","layoutEngineVersionMajor":"77","deviceClass":"Desktop","agentNameVersionMajor":"Vivaldi 2","deviceCpuBits":"64","operatingSystemClass":"Desktop","layoutEngineName":"Blink","agentName":"Vivaldi","agentVersion":"2.8.1664.44","layoutEngineClass":"Browser","agentNameVersion":"Vivaldi 2.8.1664.44","operatingSystemVersion":"7","deviceCpu":"Intel x86_64","agentClass":"Browser","layoutEngineVersion":"77.0"}}]},"domain_sessionid":"8142b300-4607-4bd1-95e3-bb54b3563861","derived_tstamp":"2019-10-30T17:30:29.165Z","event_vendor":"com.snowplowanalytics.snowplow","event_name":"page_view","event_format":"jsonschema","event_version":"1-0-0","event_fingerprint":"43e44f37a65e062c352fbad03e1f173b","true_tstamp":null}




The second one is the output when I am using the event of my enricher.


Well, I thought that might be an issue and I tried it using the docker images today, but it was giving the same output, unfortunately.


sample-app-https\tweb\t2021-02-09\t10:14:44.695\t2021-02-09\t10:14:39.690\t2021-02-09 10:14:39.246\tpage_ping\t5f0e3a72-fe02-4b98-8253-18cdf2d98050\t\tbc\tjs-2.17.0\tssc-1.0.1-kafka\tstream-enrich-1.0.0-common-1.0.0\t\t127.0.0.x\t\t0e9e74af-06b3-4e00-81cc-6d358c73a6dc\t19\t919a7844-1b17-40ba-a6a1-d4a9f5540eda\t\t\t\t\t\thttps://name.blob.core.windows.net/$web/index.html\tSnowplow Sample Webapp\t\thttps\name.blob.core.windows.net\t443\t/$web/index.html\t\t\t{"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0","data":[{"schema":"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0","data":{"id":"ce5a4c2b-f1b2-4a10-b933-82de91856ac7"}},{"schema":"iglu:org.w3/PerformanceTiming/jsonschema/1-0-0","data":{"navigationStart":1612865649061,"unloadEventStart":1612865649173,"unloadEventEnd":1612865649174,"redirectStart":0,"redirectEnd":0,"fetchStart":1612865649061,"domainLookupStart":1612865649069,"domainLookupEnd":1612865649110,"connectStart":1612865649110,"connectEnd":1612865649147,"secureConnectionStart":1612865649126,"requestStart":1612865649147,"responseStart":1612865649159,"responseEnd":1612865649171,"domLoading":1612865649173,"domInteractive":1612865649186,"domContentLoadedEventStart":1612865649187,"domContentLoadedEventEnd":1612865649190,"domComplete":1612865649244,"loadEventStart":1612865649244,"loadEventEnd":1612865649244}}]}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t00\t0\t0\tMozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0\t\t\t\t\t\ten-US\t\t\t\t\t\t\t\t124\t1280\t616\t\t\t\tEurope/Berlin\t\t\t1280\t720\twindows-1252\t1280\t616\t\t\t\t\t\t\t\t\t\t\t2021-02-09 10:14:39.247\t\t{"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0","data":{"useragentFamily":"Firefox","useragentMajor":"85","useragentMinor":"0","useragentPatch":null,"useragentVersion":"Firefox 85.0","osFamily":"Windows","osMajor":"10","osMinor":null,"osPatch":null,"osPatchMinor":null,"osVersion":"Windows 10","deviceFamily":"Other"}}]}\tc67f9ef9-3622-452f-b11d-c5d614ab4753\t2021-02-09 10:14:39.689\tcom.snowplowanalytics.snowplow\tpage_ping\tjsonschema\t1-0-005e84e07927c18bb563920504db3b183\t

Thanks again and in case anything is needed please let me know:)

Sorry I missed the fact that you’re using Kafka, indeed it’s not supported yet by Postgres loader.

How/where do you read enriched events in the output of enrich ?

Yes, the output is always the same. You can find the list of fields here.

Have you checked that your event and the event that you found on discourse have the exact same number of fields ?

1 Like

Hi,
Thanks for your quick reply:)

For testing purpose I read the events using a script from Kafka to read from the topics and displays the events in the terminal. Then I just copy them from the terminal and test locally with the SDK. Do you think that I might have lost any tab values when I copied them?

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic snowplow-enriched-good-events-stream

Regarding the whole pipeline, I have set up Kafka Stream application in Java, reading from one Kafka topic(snowplow-good-enriched-envents-topic), processing the output(using the SDK tool or any other converter) and printing the output to another topic which is then connected with PostgreSQL via JDBC Kafka Sink Connector from Confluent platform(which requires only JSON or AVRO format to work).

Thaks I will have a look into that.

Well, my events are shorter than they should be :sweat_smile:. When I tried for example to map each value with a specific field it never reaches the true_tstamp field, which I assume is the last one. It gets stuck somewhere between br_features_java or doc_height field depending on the number of fields the enricher outputs. (My event_fingerprint is mapped to the br_features_java) When I tested with the other events all the fields were mapping perfectly fine. It was like in my events was missing some values that are supposed to be null/empty, but I cant figure out why.

This is a screenshot when I try to map one of my own events(Please, notice that when I use my own events the mapping is always varying depending on the output).

map1

Below is the mapping from another enriched event.

Thanks again.

I very suspect that tabs get lost in this process. Either they are missing already in the output of kafka-console-consumer.sh or they are there and then are not taken when you copy from the terminal. Instead of displaying the output in the terminal you could try to redirect the output in a file with bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic snowplow-enriched-good-events-stream > enriched.txt

If tabs are still missing, you probably want to use a real Kafka consumer to consume data.

1 Like

@BenB thanks a lot for your contribution. Yes now it is fine, probably when you copy them from the terminal there are a lot of tabs that get lost, no idea why. Once again thanks, take care :slight_smile:

1 Like