I have set up a real-time scala stream collector > streaming enrich > s3loader pipeline. I have moved from a batch pipeline setup which used Elastic beanstalk Clojure collector.
I use AWS Athena to query bad events. I used to be able to query the batch pipline bad events, by using presto functions :
this gave me a TSV line of raw events and was subsequently easy to deconstruct and parse in SQL queries.
But since I moved to the scala stream collector, the data in
line of bad events has become undecipherable. Possibly because the line is thrift encoded. Is there a way to make it more human readable and more importantly queryable?