Understanding if iglu schemas have successfully deployed


#1

I recently created a new specific iglu schema. For some new events. When we deploy this as normal, and yet the updated schema doesn’t get seen. All the events conforming to this schema get sent to ‘bad data’ as opposed to enriched.

I’ve tried Understanding schema validation and caching in Snowplow using this advice of turning it on and off again.

I’m using R83 Bald and using the stream enrich.

It’d be interesting to know if the schema was correctly deployed, if we for example made a configuration error.


#2

@springcoil, I assume by “iglu schema” you meant JSON schema. I also assume that the bad data is due to the error like “missing schema” and the bucket is used for the purpose is public.

  1. Have you tried to access the schemas via Iglu URI. Is it listed there?
  2. If it’s listed, is it in correct format - paths, properties, etc?

#3

Hi @ihor here is the error message

","errors":[{"level":"error","message":"error: Could not find schema with key iglu:com.busuu/welovebusuu_event/jsonschema/1-0-0 in any repository, tried:\\n level: \\"error\\"\\n repositories: [\\"Iglu Central [HTTP]\\",\\"Iglu Client Embedded [embedded]\\",\\"busuu Iglu Repo [HTTP]\\"]\\n"}],"failure_tstamp":"2019-01-04T17:54:48.664Z"}']

  • We are using s3 to host our iglu schemas and we’ve mirrored Iglu central, our s3 bucket isn’t public. We are able to access the schemas via Iglu URI.
  • It has the correct format, path, properties. I do indeed mean ‘JSON schema’

#4

@springcoil, if it’s not public how sure are you that Iglu client (enrichment) can access your Iglu schema? Do you host any other custom schemas that are accessible OK from EMR cluster during enrichment step?

Do note it is imperative that the JSON schema is located in the path like s3://<iglu-bucket>/schemas/com.busuu/welovebusuu_event/jsonschema/1-0-0 (note schemas folder).


#5

We host other schemas and we’re able that access them with our igluclient.


#6

It is not the same. The enrichment is run on EMR cluster that is spun every time you start EmrEtlRunner. Accessing the Iglu with Igluctl via AWS S3 API is one thing and accessing HTTP service of your Iglu server is the other. Not only the servers are different from which you access Iglu but also the service you use by Igluctl and Iglu client are different.