PostgresDB: Custom Context Missing

Good morning! Our pipeline is currently set up with three different loaders pulling off of our enriched stream:

  • Elasticsearch
  • S3
  • PostgresDB

While events are getting loaded into all three data stores, our custom context is only making it into two. For example, we can see the following in both ES and S3. I understand that in PostgresDB a new table should be created and populated per context. However, this table is simply never made.

contexts_com_my_org_app_context_1
{
  "env": "local"
}

I’ve combed through the logs of loader and no errors could be found. We are using the “atomic” database schema as referenced by the Snowplow documentation and have several other enrichments and default context tables populating correctly. Listed below is the configuration for the Postgres loader.

module "postgres_loader_enriched" {
  source = "snowplow-devops/postgres-loader-kinesis-ec2/aws"

  name       = "postgres-loader-enriched-server"
  vpc_id     = "<vpc_id>"
  subnet_ids = [<subnets>]
  in_stream_name = "enriched-stream"
  # Note: The purpose defines what the input data set should look like
  purpose = "ENRICHED_EVENTS"
  # Note: This schema is created automatically by the VM on launch
  schema_name = "atomic"

  ssh_key_name     = "snowplow-ssh"

  db_sg_id    = "<sg_id>"
  db_host     = "<url>"
  db_port     = 5432
  db_name     = "enriched"
  db_username = "<user>"
  db_password = "<pass>"
}

This is the schema for the additional global context:

{
    "$schema":"http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
    "self":{
        "vendor":"my.org",
        "name":"app_context",
        "format":"jsonschema",
        "version":"1-0-0"
    },
    "description":"Generic schema for a additional data",
    "properties":{},
    "additionalProperties":true,
    "type":"object"
}

Hey @Katie62442

I’m not sure the Postgres loader will be able to support the generic schema you detailed (with additionalProperties: true).

The Postgres loader needs to be able to infer the table structure from the schema to be able to create the table and therefore load the event.

We typically don’t advise these sorts of loose, non-validating schemas irrespective of where you are loading. Systems like Elasticsearch and S3 are very forgiving but many warehouse are not, plus your data analysts can’t know what to expect when they are presented with data that’s loaded from that generic schema.

I’d advise you define a more strongly typed schema which matches the data you want to track in your applications (or perhaps multiple schemas).

Most Snowplow loaders would produce a failed (bad) event in this case but as the Postgres loader is our least mature, I don’t believe it does at the moment. The Postgres loader is useful for simple pipelines and proof of concepts but we don’t typically recommend it for more complex or high volume use cases.

Hope that helps, happy to answer any follow ups.

2 Likes

Thank you so much for that insight! I have updated our schema to the following:

{
    "$schema":"http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
    "self":{
        "vendor":"my.org",
        "name":"app_context",
        "format":"jsonschema",
        "version":"1-0-0"
    },
    "description":"Generic schema for additional data",
    "properties":{
        "env": {
            "description": "Environment for application",
            "type": ["string", "null"],
            "maxLength": 255
        }
    },
    "additionalProperties":false,
    "required": ["env"],
    "type":"object"
}

However, this doesn’t seem to be loaded either. I did notice that we’re not getting any rows in the atomic.events table for the applications sending this context either so the loader must be hitting an undisclosed error. I’ve validated the schema above, so I’m a little confused as to why this structure isn’t working either.