Skip archive_enriched


#1

What does this option “–skip archive_enriched” exactly do? Is it only skipping the archive of enriched/good?


#2

@RichardJ,

Yes, it skips step 12 as per dataflow diagram. Essentially, the data will be loaded to the storage target but neither enriched not shredded files will be archived.

PS. The diagram is fully accurate for the Snowplow release up to R87.


#3

Thanks Ihor! Your response is very helpful.

Do you have a similar dataflow diagram for Snowplow Stream processing?


#4

@RichardJ,

Not in that format. Though, we do have a general Realtime Time pipeline (lambda) architecture: How to setup a Lambda architecture for Snowplow. It depicts just one of the approaches. Since the diagram was posted it became possible to utilise stream enriched data in the batch branch thus avoiding enrichment process in the batch by running EmrEtlRunner with --skip staging,enrich.