Understanding the errors bucket in the Snowplow Enrichment process


#1

Hi,

I would like to understand the snowplow enrichment process in case of enrichment for bad and error folder logs.As per my understanding , all the raw logs go through validations and parsing , so in case of issues the raw logs are directed to enrich/bad folder and shredded/bad folder. But sometimes I even face the issue when the enrichment job has failed and all the raw logs were in enrich/error or shredded/error. So what I want to understand is what are the possible causes that raw logs result into error. Also I tried looking at the Job logs to find any logs which could help me in finding the cause of the issue but could not find any log showing that the possible issue with the step failure. Please let me know the steps to find out the issue in case of enrichment job failure.

Thanks,
Shilpi Singh


#2

Hi Shilpi,

It’s a good question. enriched/bad is generated by the Snowplow Enrichment process itself - we have total control over this output, and try to make it as detailed and actionable as possible. Reading the error messages attached to the raw events, it should be possible to understand what has gone wrong in your source events, and how you can resolve it (e.g. if/how you can run Hadoop Event Recovery).

By contrast, enriched/errors is a built-in feature of Cascading, the Hadoop ETL pipeline that we use under the hood in the Snowplow Hadoop Enrichment process. What we call enriched/errors and continue_on_unexpected_error, Cascading calls the “failure trap”. Any source event which throws an uncaught exception during the Hadoop processing will be safely routed to the failure trap.

Unfortunately, Cascading’s failure trap only includes the source rows - it doesn’t record the uncaught exception in any way. This makes it difficult to investigate what is going on. One workaround is:

  1. Download a source event from enriched/errors
  2. Replace the input the job tests in the Hadoop Enrich test suite with the event from #1 (e.g. this line here)
  3. Disable the error trap here
  4. Run the test
  5. Observe what exception is thrown

Hope this helps!