RDB Shredder 1.0.0 Iglu Config Error

Hey, I’m quite new to Snowplow and I’ve already deployed the open source pipeline up to the S3 enriched sink and now I’m trying to run the dataflow-runner to shred the events but it seems my resolver.json is not correctly set.

resolver.json

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-3",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      }
    ]
  }
}

playbook,json

{
  "schema": "iglu:com.snowplowanalytics.dataflowrunner/PlaybookConfig/avro/1-0-1",
  "data": {
    "region": "eu-central-1",
    "credentials": {
      "accessKeyId": "default",
      "secretAccessKey": "default"
    },
    "steps": [
      {
        "type": "CUSTOM_JAR",
        "name": "S3DistCp enriched data archiving",
        "actionOnFailure": "CANCEL_AND_WAIT",
        "jar": "/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar",
        "arguments": [
            "--src", "s3://s3sink/enriched/good/",
            "--dest", "s3://s3sink/archive/enriched/run={{nowWithFormat "2006-01-02-15-04-05"}}/",
            "--s3Endpoint", "s3-eu-central-1.amazonaws.com",
            "--srcPattern", ".*",
            "--outputCodec", "gz",
            "--deleteOnSuccess"
        ]
      },

      {
        "type": "CUSTOM_JAR",
        "name": "RDB Shredder",
        "actionOnFailure": "CANCEL_AND_WAIT",
        "jar": "command-runner.jar",
        "arguments": [
            "spark-submit",
            "--class", "com.snowplowanalytics.snowplow.rdbloader.shredder.batch.Main",
            "--master", "yarn",
            "--deploy-mode", "cluster",
            "s3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-shredder/snowplow-rdb-shredder-1.0.0.jar",
            "--iglu-config", "{{base64File "/path/to/dataflow-runner/shredder/resolver.json"}}",
            "--config", "{{base64File "/path/to/dataflow-runner/shredder/config.hocon"}}"
        ]
      }
    ],
    "tags": [ ]
  }
}

error

{
    "schema": "iglu:com.snowplowanalytics.snowplow.badrows/loader_iglu_error/jsonschema/2-0-0",
    "data": {
        "processor": {
            "artifact": "snowplow-rdb-loader-common",
            "version": "1.0.0"
        },
        "failure": [
            {
                "schemaCriterion": "iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-*-*",
                "error": {
                    "error": "ResolutionError",
                    "lookupHistory": [
                        {
                            "repository": "Iglu Central",
                            "errors": [
                                {
                                    "error": "NotFound"
                                }
                            ],
                            "attempts": 1,
                            "lastAttempt": "2021-05-14T11:08:10.064Z"
                        },
                        {
                            "repository": "Iglu Client Embedded",
                            "errors": [
                                {
                                    "error": "NotFound"
                                }
                            ],
                            "attempts": 1,
                            "lastAttempt": "2021-05-14T11:08:10.064Z"
                        }
                    ]
                }
            }
        ],
        "payload": {...,
            "contexts": {
                "schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0",
                "data": [
                    {
                        "schema": "iglu:org.w3/PerformanceTiming/jsonschema/1-0-0",
                        "data": {
                            "navigationStart": 1620986562657,
                            "redirectStart": 0,
                            "redirectEnd": 0,
                            "fetchStart": 1620986563055,
                            "domainLookupStart": 1620986563055,
                            "domainLookupEnd": 1620986563055,
                            "connectStart": 1620986563055,
                            "secureConnectionStart": 0,
                            "connectEnd": 1620986563055,
                            "requestStart": 1620986563056,
                            "responseStart": 1620986563250,
                            "responseEnd": 1620986563251,
                            "unloadEventStart": 0,
                            "unloadEventEnd": 0,
                            "domLoading": 1620986563264,
                            "domInteractive": 1620986563477,
                            "domContentLoadedEventStart": 1620986563712,
                            "domContentLoadedEventEnd": 1620986563712,
                            "domComplete": 1620986564427,
                            "loadEventStart": 1620986564451,
                            "loadEventEnd": 1620986564451
                        }
                    },
                    {
                        "schema": "iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0",
                        "data": {
                            "id": "d2040044-4329-4247-af09-cee1a9c5525e"
                        }
                    }
                ]
            },
           "unstruct_event": {
                "schema": "iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0",
                "data": {
                    "schema": "iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1",
                    "data": {
                        "targetUrl": "https://web.com",
                        "elementId": "",
                        "elementClasses": [
                            "sc-eCssSg",
                            "bXughz",
                            "sc-jLiVlK",
                            "hmPzYI",
                            "btn"
                        ],
                        "elementTarget": ""
                    }
                }
            },
"derived_contexts": {
                "schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0",
                "data": [
                    {
                        "schema": "iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0",
                        "data": {
                            "useragentFamily": "Chrome",
                            "useragentMajor": "90",
                            "useragentMinor": "0",
                            "useragentPatch": "4430",
                            "useragentVersion": "Chrome 90.0.4430",
                            "osFamily": "Mac OS X",
                            "osMajor": "10",
                            "osMinor": "15",
                            "osPatch": "7",
                            "osPatchMinor": null,
                            "osVersion": "Mac OS X 10.15.7",
                            "deviceFamily": "Mac"
                        }
                    }
                ]
            },
            "domain_sessionid": "28940c40-bb53-46fd-80d0-d587bd673cc7",
            "derived_tstamp": "2021-05-14T11:02:45.012Z",
            "event_vendor": "com.snowplowanalytics.snowplow",
            "event_name": "link_click",
            "event_format": "jsonschema",
            "event_version": "1-0-1",
            "event_fingerprint": null,
            "true_tstamp": null

}

What am I doing wrong? tried to change the json-schema version from 1.0.0 to 1.0.3 but that didn’t make any difference

@BenB Hey, sorry to tag you but I’ve seen you comment on another similar topic and I was wondering if you can take a look at my post as well because I’m pretty much stuck and need help with this. Can you help me?

Hi @Joao_Miguel_Santos ,

The issue is that shredder is trying to talk to an Iglu Server, whereas Iglu Central is a static Iglu registry.

Since R32 of RDB loader, it is required to use an Iglu Server in order to determine the order of the columns when shredding as TSV. An Iglu Server brings the possibility to list all the schemas for a major version, 1-*-* in your case.

We’ve updated the documentation to make it clear.

Thanks for the clarification @BenB. I’ve setup the Iglu Server on our Kubernetes Cluster and pushed the schemas via igluctl to it. Successfully curled from my local machine to make sure it was available. Tried running the RDB Shredder but gave me the following error:

User class threw exception: java.lang.RuntimeException: RDB Shredder could not fetch iglu:com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0 schema at initialization. Schema cannot be resolved in following repositories:
* Iglu Central due [NotFound] after 1 attempt
* Iglu Client Embedded due [NotFound] after 1 attempt
at com.snowplowanalytics.snowplow.rdbloader.common.transformation.EventUtils$.$anonfun$getAtomicLengths$1(EventUtils.scala:70)
at shade.package$$anon$1.map(package.scala:81)
at shade.Functor$Ops.map(Functor.scala:233)
at shade.Functor$Ops.map$(Functor.scala:233)
at shade.Functor$ToFunctorOps$$anon$4.map(Functor.scala:250)
at com.snowplowanalytics.snowplow.rdbloader.common.transformation.EventUtils$.getAtomicLengths(EventUtils.scala:64)
at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.ShredJob$.run(ShredJob.scala:196)
at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.Main$.main(Main.scala:41)
at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:728)

I also ssh’ed into the EMR cluster to confirm that the mentioned schema was available to it by running curl https://our-server-domain/api/schemas/com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0 -X GET on the shell successfully.

What am I doing wrong?

@Joao_Miguel_Santos Can you share the URI you’re using in your iglu_resolver.json file?

I experienced the same issue and was able to resolve it by formatting my URI in the resolver as follows:

"uri": "{{iglu-server-domain}}:{{port}}/api/"

It’s probably worth noting that @Colm told me it’s non-standard to include the port in the URI, but that was the only way it would work for me. You might try it yourself with and without the port and see if you have any luck. Hope that helps!

@samurijv2 thanks for the tip. I was paying attention to your thread and just adding /api in the end of the URI solved the issue. I believe this should be documented by it’s not clear that while self hosting an Iglu Server that you need to add this suffix to the URI.

You’re right it should have been more clear. We’ve added it :

2021-05-28-084210_608x94_scrot

Sorry for the trouble and glad that you got it to work!