Storage loader error


#1

Hi,

We are trying to capture mailer activities like mail sent/open/click. For mail open we are using pixel tracker and for mail clicking we are trying to use uri_redirect method.

We are tracking these events using our own schema so they will get tracked as unstruct events. We set up JSON schema, jsonpath file and also created the redshift table using schema-guru.

After firing few events when we ran EmrEtlRunner, we are getting error from storage loader. Below is the error :

--------------------------------------------------------------------------------------------------------------------

**Loading Snowplow events and shredded types into Shaadi Redshift database (Redshift cluster)...**
**Unexpected error: Java::Default::PSQLException error executing COPY statements: BEGIN;**
**COPY atomic.events FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/atomic-events' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' REGION AS 'us-east-1' DELIMITER '\t' MAXERROR 1 EMPTYASNULL FILLRECORD TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_shaadi_communication_tracking_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.shaadi/communication_tracking/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-shaadi-assets/custom-events-jsonpaths/com.shaadi/communication_tracking_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_shaadi_help_me_write_this_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.shaadi/help_me_write_this/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-shaadi-assets/custom-events-jsonpaths/com.shaadi/help_me_write_this_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_client_session_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/client_session/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/client_session_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_geolocation_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/geolocation_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/geolocation_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_mobile_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/mobile_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/mobile_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_screen_view_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/screen_view/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/screen_view_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_ua_parser_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/ua_parser_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_uri_redirect_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/uri_redirect/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/uri_redirect_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_web_page_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/web_page/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/web_page_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.org_w3_performance_timing_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/org.w3/PerformanceTiming/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/org.w3/performance_timing_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COMMIT;: ERROR: Manifest file is not in correct json format**
**  Detail: **
** -----------------------------------------------**
**  error:  Manifest file is not in correct json format**
**  code:      8001**
**  context:   Manifest file location = s3://snowplow-shaadi-assets/custom-events-jsonpaths/com.shaadi/communication_tracking_1.json**
**  query:     8086302**
**  location:  s3_utility.cpp:293**
**  process:   padbmaster [pid=20258]**
** -----------------------------------------------**
**/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/lib/snowplow-storage-loader/redshift_loader.rb:88:in `load_events_and_shredded_types'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:54:in `(root)'**
**org/jruby/RubyArray.java:1613:in `each'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:51:in `(root)'**
**org/jruby/RubyKernel.java:1091:in `load'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**org/jruby/RubyKernel.java:1072:in `require'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**/tmp/jruby8883550174880908063extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)'**
**Loading Snowplow events and shredded types into Shaadi Redshift database (Redshift cluster)...**
**Unexpected error: Java::Default::PSQLException error executing COPY statements: BEGIN;**
**COPY atomic.events FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/atomic-events' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' REGION AS 'us-east-1' DELIMITER '\t' MAXERROR 1 EMPTYASNULL FILLRECORD TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_shaadi_communication_tracking_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.shaadi/communication_tracking/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-shaadi-assets/custom-events-jsonpaths/com.shaadi/communication_tracking_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_shaadi_help_me_write_this_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.shaadi/help_me_write_this/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-shaadi-assets/custom-events-jsonpaths/com.shaadi/help_me_write_this_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_client_session_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/client_session/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/client_session_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_geolocation_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/geolocation_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/geolocation_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_mobile_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/mobile_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/mobile_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_screen_view_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/screen_view/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/screen_view_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_ua_parser_context_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/ua_parser_context_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_uri_redirect_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/uri_redirect/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/uri_redirect_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.com_snowplowanalytics_snowplow_web_page_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/com.snowplowanalytics.snowplow/web_page/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/com.snowplowanalytics.snowplow/web_page_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COPY atomic.org_w3_performance_timing_1 FROM 's3://snowplow-etl-emr-runner/shredded/good/run=2016-10-14-20-32-38/org.w3/PerformanceTiming/jsonschema/1-' CREDENTIALS 'aws_access_key_id=xxxxxxxxxxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' JSON AS 's3://snowplow-hosted-assets-us-east-1/4-storage/redshift-storage/jsonpaths/org.w3/performance_timing_1.json' REGION AS 'us-east-1' MAXERROR 1 TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS ;**
**COMMIT;: ERROR: Cannot COPY into nonexistent table com_snowplowanalytics_snowplow_uri_redirect_1**
**/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/lib/snowplow-storage-loader/redshift_loader.rb:88:in `load_events_and_shredded_types'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:54:in `(root)'**
**org/jruby/RubyArray.java:1613:in `each'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:51:in `(root)'**
**org/jruby/RubyKernel.java:1091:in `load'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**org/jruby/RubyKernel.java:1072:in `require'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**/tmp/jruby6503417955291522470extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)'**
**Loading Snowplow events and shredded types into Shaadi Redshift database (Redshift cluster)...**
**Unexpected error: Java::Default::PSQLException error executing ANALYZE statements: BEGIN;**
**ANALYZE atomic.events;**
**ANALYZE atomic.com_shaadi_communication_tracking_1;**
**ANALYZE atomic.com_shaadi_help_me_write_this_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_client_session_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_geolocation_context_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_mobile_context_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_screen_view_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_ua_parser_context_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_uri_redirect_1;**
**ANALYZE atomic.com_snowplowanalytics_snowplow_web_page_1;**
**ANALYZE atomic.org_w3_performance_timing_1;**
**COMMIT;: ERROR: skipping "com_shaadi_communication_tracking_1" --- only table or database owner can analyze it**
**/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/lib/snowplow-storage-loader/redshift_loader.rb:111:in `load_events_and_shredded_types'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:54:in `(root)'**
**org/jruby/RubyArray.java:1613:in `each'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/storage-loader/bin/snowplow-storage-loader:51:in `(root)'**
**org/jruby/RubyKernel.java:1091:in `load'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**org/jruby/RubyKernel.java:1072:in `require'**
**file:/home/ubuntu/emr-tools/snowplow-storage-loader!/META-INF/main.rb:1:in `(root)'**
**/tmp/jruby1139177695818426949extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)'**
----------------------------------------------------------------------------------------------------------------------

Please help us with this.


#2

Hi @rahul,

Could you share the JSON path and JSON schema for the communication_tracking event?


#3

Hi @mike

Attaching the JSON Schema & JSON path file :

Also please go through the details for reference :

First time when we ran the storage loader it showed the invalid format for json path file, we checked that json path was indeed invalid then we corrected the file and uploaded and again ran the storage loader. this time we got an error regrading non existent table uri_redirect_1 so we created table and again ran the storage loader.

This time we got the error which I posted in the previous post. Hope this will help you to figure out what could be the problem.

JSON Schema

{
	"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
	"description": "Schema for communication activity tracking",
	"self": {
		"vendor": "com.shaadi",
		"name": "communication_tracking",
		"format": "jsonschema",
		"version": "1-0-0"
	},

	"type" : "object",
	  "properties" : {
		"sender" : {
			"type" : "string"
		},
		"receiver" : {
			"type" : "string"
		},
		"communication_type" : {
			"type" : "string"
		},
		"category1" : {
			"type" : "string"
		},
		"category2" : {
			"type" : "string"
		},  
		"category3" : {
			"type" : "string"
		},
		"action" : {
			"type" : "string"
		},   
		"hash_id" : {
			"type" : "string"
		},
		"receiver_address" : {
			"type" : "string"
		},
		"extra_info1" : {
			"type" : "string"
		},
		"extra_info2" : {
			"type" : "string"
		}
	},
	"additionalProperties" : false
}

JSON Path

{
    "jsonpaths": [
        "$.schema.vendor",
        "$.schema.name",
        "$.schema.format",
        "$.schema.version",

        "$.hierarchy.rootId",
        "$.hierarchy.rootTstamp",
        "$.hierarchy.refRoot",
        "$.hierarchy.refTree",
        "$.hierarchy.refParent",

	"$.data.sender",
        "$.data.receiver",
        "$.data.communication_type",
        "$.data.category1",
        "$.data.category2",
        "$.data.category3",
        "$.data.action",
        "$.data.hash_id",
        "$.data.receiver_address",
        "$.data.extra_info1",
        "$.data.extra_info2"
    ]
}

#4

@mike

Also I just checked we have events in communication_tracking table and uri_redirect table but still storage loader is showing error on running?

Why this is happening? And is there risk of data duplication in other tables due to rerunning storage loader?

Thanks


#5

The JSON schema and JSON path look valid but the error about the non-existent table suggests that it’s trying to load into the uri_redirect_1 from the snowplow.snowplowanalytics.com vendor rather than your custom vendor (com_shaadi_uri_redirect_1). Are you sending the event payload with that vendor?

You’ll also want to make sure that any tables you are creating manually are owned by the storageloader user so you don’t run into permissions issues when the data attempts to load.


#6

@mike

I tried sending our unstruct event payload with uri_redirect please see the below URL :

href=“http://[Collecto_uri]/r/tp2?u=[URL_to_redirect]&sender=rahul&reciever=testuser&communication_type=mail_click&category1=link_click&category2=without_schema&category3=dummy&action=click&hash_id=xyz_123&receiver_address=rahul@testmail.com&extra_info1=testmail&extra_info2=testmail”>Click here"

Is this wrong?

I want to capture all the details with uri_redirect rather than just uri where it was redirected.

Also please suggest us if the above way of sending data payload is wrong then how to send unstruct event with uri_redirect and what should we do to run storage loader for now?


#7

For self describing (unstruct) events you’ll want to send the payload in the format shown here rather than just sending the data as url params.

If you’re looking to quickly test schemas against the events you are sending Snowplow Mini is quick to set up and provides a relatively short feedback loop if anything is failing. Snowplow Mini will sink your events into Elasticsearch (into good/bad), but not Redshift.


#8

@mike

Thanks

But still there are few un-answered questions. I want to send unstruct event with uri_redirect, is this possible? As we do in pixel tracker, we send schema name in the query parameter with data payload? can we do the same with uri_redirect?

And what should we do to run storage loader for now? Will re-running be enough? As we tried re-running earlier but it gave us the error I posted in the first post, so how should we re-run it?

With re-running storage loader is there risk of data duplication in other tables like atomic.events etc?

Thanks


#9

Hi @rahul - you need to create the missing uri_redirect table I think?


#10

maybe you try loading with a manual executed copy statement, it’s easier to figure out if the jsonpath file and so on is correct.
the error:

ERROR: skipping “com_shaadi_communication_tracking_1” — only table or database owner can analyze it

can fixed by creating the table as the storageloader user (recommended) or setting the permissions for this user or by the --skip-analyze option.
maybe you can also share an example event, i would also suggest the vendor in it is wrong


#11

Yes, you can do this. The name-value pairs on the querystring are just the regular Snowplow Tracker Protocol: