Snowplow StorageLoader fails with message: Cannot COPY into nonexistent table "snowplow_duplicate_1"


#1

hi there!

I just completed a basic build of snowplow for testing and got it to run finally!
However a few days later I see this error on running the storage loader:
Unexpected error: Java::OrgPostgresqlUtil::PSQLException error executing COPY statements: BEGIN;

COPY atomic.events FROM 's3://anlytics-data/shredded/good/run=2017-01-30-16-26-15/atomic-events' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' REGION AS 'us-east-1' DELIMITER '\t' MAXERROR 1 EMPTYASNULL FILLRECORD TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ; COPY atomic.com_snowplowanalytics_snowplow_duplicate_1 FROM 's3://anlytics-data/shredded/good/run=2017-01-30-16-26-15/com.snowplowanalytics.snowplow/duplicate/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/duplicate_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ; COMMIT;: ERROR: Cannot COPY into nonexistent table com_snowplowanalytics_snowplow_duplicate_1 uri:classloader:/storage-loader/lib/snowplow-storage-loader/redshift_loader.rb:89:in `load_events_and_shredded_types' uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to' uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with' uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method' uri:classloader:/storage-loader/bin/snowplow-storage-loader:54:in `block in (root)' uri:classloader:/storage-loader/bin/snowplow-storage-loader:51:in `<main>' org/jruby/RubyKernel.java:973:in `load' uri:classloader:/META-INF/main.rb:1:in `<main>' org/jruby/RubyKernel.java:955:in `require' uri:classloader:/META-INF/main.rb:1:in `(root)' uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

I saw a few posts similar to this error and realize that this is due to missing tables based on the iglu definitions needed. However, its not clear which sql is needed to be added into the redshift db? I have not modified or added any custom or self describing schemas as yet - just using standard iglu schemas as per the resolver.conf file:

{ "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1", "data": { "cacheSize": 500, "repositories": [ { "name": "Iglu Central", "priority": 0, "vendorPrefixes": [ "com.snowplowanalytics" ], "connection": { "http": { "uri": "http://iglucentral.com" } } } ] } }

Is there a guide to which sql files are needed for the standard setup? And how to relate the various sqls with custom schemas? By finding the duplicate_1 sql in the iglu repo, I was able to make my StorageLoader work, but its possible that it could break at a later point when searching for another table not added into redshift from the iglu repo.

Thanks very much!


#2

Hi @kjain - glad you found the duplicate_1.sql table and deployed it to your Redshift.

There isn’t really such a thing as a “standard setup” for Snowplow, with an associated list of Redshift tables - it depends hugely on which trackers and enrichments you use. However, you are right - we should do better at documenting which tables are associated with each features.


#3

Thanks Alex!
Looking forward to that documentation :slight_smile:


#4

Hey All, I would like +1 the SQL documentation

I’ve just setup Snowplow on my local dev environment and pushing to Redshift and on first time running the storageloader, I had to manually insert the required tables. After a few Google searches I was able to find the Git repo linked above and add them.

Then after the second run this morning, I found more missing tables that I needed to manually insert again. Was easy finding the data and having them inserted, but would be nice to understand what are the required tables for a fresh Redshift setup and when we add enrichments, that we need to add the associated SQL.

Otherwise, Docs were simple enough to follow and get setup.


#5

Hi @TimLavelle,

Welcome to the forum!

Noted.