How to debug shredding process?

mrosack · June 8, 2017, 2:26pm

I’ve just wasted quite a lot of time debugging an issue with my shredding process - I created a custom javascript enrichment which output to a custom type. The enrichment worked fine, but my JSON schema was incorrect, so the records would never get shredded. It took me forever to locate where the issue was, though, because there were no errors whatsoever telling me what happened - the bad shredded output bucket contained only 0 byte files, and the records were completely missing from the good shredded bucket.

Is there something I’m missing or that I have misconfigured? Should there be an error message or bad record somewhere if validation fails during shredding? Is there any way to test just the shredding process outside of EMR so it doesn’t take 20-30 minutes to debug a change? Thanks!

alex · June 8, 2017, 4:34pm

Hi @mrosack - that’s very unusual. Did you check in the errors bucket as well - was that also empty? Every row of input should end up in either good, bad or errors. If it’s not, then that’s a bug in our shredding process.

Would you be able to share an example enriched event which disappears in shredding? We can then trace that through and figure out what is going on.

mrosack · June 8, 2017, 7:42pm

I thought the errors bucket was it at first - I didn’t have the errors bucket configured, but even after I set it up the errors bucket was empty and nothing changed. Attached are the enriched events I was testing with - if the shredding process can’t find a schema for or validate com.ferritelabs.snowplow/touchpoint_interaction/jsonschema/1-0-0 the records disappear.

https://s3-us-west-1.amazonaws.com/snowplow-example-shred-failure/part-00001.gz

Thanks for your help!

alex · June 8, 2017, 9:53pm

Thanks, created:

The title reflects the fact that we won’t know for sure if Spark Shred exhibits the same issue you found in Hadoop Shred.

Topic		Replies	Views
Skipping json validation and "validating" on enrichment instead? Enrichment	2	1548	January 13, 2017
Exception occuring while attempting to recover bad shredded events For engineers	1	1831	August 23, 2019
Re-run the enrichment bad log For engineers	6	2588	August 18, 2016
R71: JSON validation in Scala Common Enrich Enrichment	3	1715	November 27, 2017
Debugging bad rows in Spark and Zeppelin [tutorial] For data modelers & consumers	1	13590	August 10, 2016

How to debug shredding process?

Related Topics