JSON Paths File not Found


#1

I’m getting the following error when running the StorageLoader on custom unstructured events:

Unexpected error: Cannot find JSON Paths file to load s3://snowplow-etl-data/shredded/good/run=2016-07-25-15-13-02/com.mycompany/gb-events-context/jsonschema/1- into atomic.com_mycompany_gb_events_context_1

However, I have already uploaded a JSON path file generated by SchemaGuru and named “gb-events-context_1” to an s3 bucket within a “com.mycompany” folder whose path is specified in the configuration file as follows:

aws:
  # Credentials can be hardcoded or set in environment variables
  access_key_id: ***
  secret_access_key: ***
  s3:
    region: us-east-1
    buckets:
      assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
      jsonpath_assets: s3://my-schemas-bucket/jsonpaths 
      log: s3n://my-emr-bucket/logs
      raw:
        in:                  # Multiple in buckets are permitted
          - s3n://my-in-bucket      # e.g. s3://my-in-bucket
        processing: s3n://my-emr-bucket/processing
        archive: s3n://my-archive-bucket/raw    # e.g. s3://my-archive-bucket/in
      enriched:
        good: s3://my-out-bucket/enriched/good       # e.g. s3://my-out-bucket/enriched/good
        bad: s3://my-out-bucket/enriched/bad        # e.g. s3://my-out-bucket/enriched/bad
        errors: s3://my-out-bucket/enriched/errors     # Leave blank unless continue_on_unexpected_error: set to true below
        archive: s3://my-out-bucket/enriched/archive   # Where to archive enriched events to, e.g. s3://my-archive-bucket/enriched
      shredded:
        good: s3://my-out-bucket/shredded/good       # e.g. s3://my-out-bucket/shredded/good
        bad: s3://my-out-bucket/shredded/bad        # e.g. s3://my-out-bucket/shredded/bad
        errors: # Leave blank unless continue_on_unexpected_error: set to true below
        archive: s3://my-out-bucket/shredded/archive    # Where to archive shredded events to, e.g. s3://my-archive-bucket/shredded

I can’t seem to figure out why the storage loader can’t find the paths file in the s3 bucket when everything has been specified. Is there something else that could be causing the problem?

Thanks in advance,
-Abhi


#2

Can you share the full S3 URI for the JSON Paths file that you reference @gbakulgod?


#3

Hi Alex,

I have the same issue. What exactly do you need to identify the problem?

Here is the JSON Paths file in question:
http://iglu.hipages.com.au/jsonpaths/au.com.hipagesgroup.hip/app_version_1.json

Error:

Unexpected error: Cannot find JSON Paths file to load s3://bucket-out/shredded/good/run=2016-12-16-04-07-26/au.com.hipagesgroup.hip/app_version/jsonschema/1- into atomic.au_com_hipagesgroup_hip_app_version_1

#4

Just to answer my own question. Since i used iglu static i have just provided the actual s3 bucket in my configuration:

jsonpath_assets: s3://staticiglubucket

After checking the storage loader source code it expects the actual path that contains the files:

jsonpath_assets: s3://staticiglubucket/jsonpaths

That worked.