Loading Snowplow events into Redshift requires creating tables to store these values according to their schema. When I was searching for how to create the tables, I came across two options:
Generating the schemas using
Running the command
./igluctl.jar static generate schemas/where the
schemafolder contains the schemas from the iglu-cental repository and applying manually the create table scripts into Redshift.
Using the scripts from iglu-central
https://github.com/snowplow/iglu-central/tree/master/sql provides the create table scripts ready to run.
The problem is that when using the first option, I ran into an error on the
rdb_load step of the
`ERROR: Data loading error [Amazon](500310) Invalid operation: Number of jsonpaths and the number of columns should match. JSONPath size: 13, Number of columns in table or column list: 14`
It turns out that there’s a mismatch between the schema generated by igluctl and the hosted assets bucket (
The second option (using the ready-to-run scripts) worked out for me, as they were compatible with the schemas in the bucket.
What I want to know is if I am doing something wrong and why are the schemas (https://github.com/snowplow/iglu-central/tree/master/schemas) different from the create table scripts (https://github.com/snowplow/iglu-central/tree/master/sql).
Also, is there a way to automate the process of creating tables in Redshift?