Data modeling using Map Reduce

mike · October 2, 2017, 1:18pm

It’s worth having a read of the recent RFC from @alex on porting the Snowplow pipeline to GCP as this makes more of a move away from Lambda in terms of moving stream processing into Beam/Dataflow rather than having to rely on something like EMR.

There are still some tricky issues around streaming (like deduplication and exactly once semantics) but it certainly looks like it’s an interesting way forward for analytics infrastructure.

Topic		Replies	Views
Data Modeling on GCP	1	811	December 3, 2021
Using Snowplow data to feed other applications For data modelers & consumers	1	1974	September 5, 2016
Approaches to access data in S3 For data modelers & consumers	2	1388	May 18, 2021
Data modelling for real time kafka pipeline Enrichment	2	910	September 2, 2020
On-premise Realtime Pipeline For engineers	2	2208	January 3, 2018

Data modeling using Map Reduce

Related Topics