Repo with custom enrichments


#1

Collector: clojure
Enricher: EMT ETL runner

Ive setup an iglu repo by cloning iglue-central, and also added a json template for unstructured events called click_event. Im a little confused about the role that the enricher files play with the EMR ETL Runner. If I do not pass the enricher flag to this directory my spark job will complete, but I will be missing the values for our unstructured events. If I pass the enricher file directory the spark job will fail on the “Elasticity Spark Step: Enrich Raw Events.” The logs are not very clear on what is the cause.

Is my understanding correct, the enricher files are used to check the file structure of the json files in iglu? If that is accurate, then I would just need to create a enricher file that matches the structure of the json file I have hosted in our repo so that the fields match?


#3

@ehubbard,

I assume by “enricher file directory” you mean the enrichment directory you pass to EmrEtlRunner with option --enrichments.

That folder contains configuration files for Configurable enrichments which would dimension widen your data during the enrichment process. That also means the need for sufficient resources to perform extra data processing on your EMR cluster.

I suspect by enabling the enrichments you encountered a failure due to one of the following:

  • the enrichments configuration files might not have been of the correct/relevant setup (since you do not know what that folder is)
  • your EMR cluster has run-out of available resources causing a crash at “Enrich Raw Events”. You might need to bump the cluster to overcome the problem