Hi Snowplow Team,
We are currently evaluating Snowplow and had it running for almost a month with this pipeline: NSQ (For Raw and Enrich Data Processing) -> Logstash (For Data Transformation from TSV to JSON) -> Elasticsearch (Storage) -> Kibana (Analysis and Visualization). Though we’re able to get real time data using abovementioned pipeline, it was advised that Redshift can provide in-depth data which Elasticsearch may lack. I am actually looking what’s the best platform or pipeline for real-time analysis in case we want to use Redshift. Should we dump the current setup or NSQ can still work as our messaging platform?
Basing from what I have researched, it seems we must utilize full AWS pipeline if we opt to use Redshift which might be costly for evaluation. Well, least cost is ideal as much as possible and without the need to heavily depend on AWS except for Redshift, I guess. Is there any other approach to this setup? Can we really use Redshift for real-time processing or it is meant for batch processing?
Thanks in advance! Looking forward to your suggestions. I’d be glad to know what really works well for this.