RDB Loader cannot find jsonpath

Hi,

we managed to setup the shredder and sqs queue so that the rdb loader will be notfied after a shredding job has been completed. However, the rdb-loader cannot find the jsonpath of our events as specified in our resolver.json. We use the same resolver.json as for our enrichment module.
Error message:

2021-03-02 14:20:22ERROR 2021-03-02 13:20:22.534: Data discovery error with following issues:
2021-03-02 14:20:22JSONPath file [com.myapp/my_tracking_event_1.json] was not found
2021-03-02 14:20:22JSONPath file [com.myapp/other_tracking_event_1.json] was not found
2021-03-02 14:20:21INFO 2021-03-02 13:20:21.626: Received new message. Total 1 messages received, 0 loaded, 0 attempts has been made to load current folder
2021-03-02 13:53:43INFO 2021-03-02 12:53:43.835: RDB Loader [myapp] has started. Listening sp-sqs-queue.fifo

We host the jsonpath files in an s3 bucket in exactly that folder s3://our-schema-repo/jsonpaths/com.myapp/my_tracking_event_1.json/.

our resolver.json

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-0",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "S3-schemas-registry",
        "priority": 0,
        "vendorPrefixes": ["com.myapp"],
        "connection": {
          "http": {
            "uri": "SP_SCHEMA_URI" 
          }
        }
      }, 
      {
        "name": "Iglu Central",
        "priority": 1,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "http://iglucentral.com"
          }
        }
      },
      {
        "name": "Iglu Central - Mirror 01",
        "priority": 2,
        "vendorPrefixes": ["com.snowplowanalytics"],
        "connection": {
          "http": {
            "uri": "http://mirror01.iglucentral.com"
          }
        }
      }
    ]
  }
}

SP_SCHEMA_URI is replaced by a cloudfront uri which is associated with our s3 bucket that hosts our schemas.

The rdb loader is running as a aws fargate task which should have permissions to access the s3 bucket.
We did not find any field in the config.hocon where the jsonpath should be set. Is that required?

{
  "name": "myapp",
  "id": "d5a4aab5-7b66-11eb-8ba2-acde48001122",

  "region": "eu-west-1",
  "messageQueue": "SQS_QUEUE",

  "shredder": {
    "input": "SP_ENRICHED_URI",
    "output": "SP_SHREDDED_GOOD_URI",
    "outputBad": "SP_SHREDDED_BAD_URI",
    "compression": "GZIP"
  },

  "formats": {
    "default": "JSON",
    "json": [ ],
    "tsv": [ ],
    "skip": [ ]
  },

  "storage" = {
    "type": "redshift",
    "host": "redshift.amazon.com",
    "database": "DATABASE",
    "port": 5439,
    "roleArn": "arn:aws:iam::AWS_ACCOUNT_NUMBER:role/RedshiftLoadRole",
    "schema": "atomic",
    "username": "DB_USER",
    "password": "DB_REDSHIFT_PASSWORD",
    "jdbc": {"ssl": true},
    "maxError": 10,
    "compRows": 100000
  },

  "steps": ["analyze"],

  "monitoring": {
    "snowplow": null,
    "sentry": null
  }
}

Hi @mgloel ,

Indeed this field is required when using JSONPaths, you can set it with:

"jsonpaths": "s3://our-schema-repo/jsonpaths"

at the root of your config. Sorry we forgot to add this information in the example and in our docs website, as this field is optional and we are moving away from JSON loading, we are adding it, thank you for noticing.

EDIT: related PR

2 Likes