Snoplow on Postgres - encoded fields, constraints, DISTKEYS and other doubts

Hello!

While working on a new JSON Schema for our structured events, we tried do build a new table to save them there. Based on link_click_1.sql1, we added our fields and ran them against our Postgres database.

The first problem we have was with field type values that are exclusive to Redshift: raw, runlength, text32k, text255.

Since we could not find any information on how to adapt these to Postgres, we them removed this information from the fields declaration, but to no avail: Postgres complained about DISTKEYS.

We understand that DISTKEYS are used by Redshift to keep data distributed across its nodes, but that this is not a Postgres parameter. So we tried to removed it.

After removing the DISTKEYS definitions, Postgres complained about the FK constraint not being satisfied because there are no unique value on the referenced field/table. Since that table events was created using the SQL script provided by Snowplow, we believe that this should be satisfied.

Our questions are:
Are there any DDL for creating the com_snowplowanalytics_snowplow_link_click_1 on Postgres? How can we satisfy the FK constraint?

If there is no DDL script to create this table, how can we adapt it to our needs?

And at last, how do we create a dedicated table for our structured custom events?

Thank you in advance,
Ricardo

============

@rcpp, self-describing events and context are currently not supported with Postgres. You would need to use one of the following as your data store to have the “non-atomic” data loaded:

  • Redshift
  • Elasticsearch (real-time)
  • Snowflake
  • S3
  • BigQuery (GCP)

Hi! Thank you for you quick reply!

I see. We will see what we can do. Right now, we have already an infrastructure with Redshfit and we were playing around with Postgres to validate some ideas.

Cheers!