Can not upload data to redshift due double value in event


#1

I have schema for unstructed event:

{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "description": "Schema for subscription purchase event",
  "self": {
    "vendor": "com.custom",
    "name": "purchase",
    "format": "jsonschema",
    "version": "1-0-0"
  },
  "type": "object",
  "properties": {
    "coupon": {
      "description": "Applied coupon",
      "type": ["string", "null"],
      "maxLength": 255
    },
    "couponValue": {
       "description": "The nominal of the discount in $",
       "type": ["number", "null"],
       "minimum": 0,
       "maximum": 2147483647
    },
    "subscriptionId": {
        "description": "subscriptionID",
        "type": ["integer", "null"],
        "minimum": 0,
        "maximum": 9223372036854775807
    },
    "purchaseValue": {
       "description": "Price in $",
       "type": ["number", "null"],
       "minimum": 0,
       "maximum": 2147483647
    },
    "subscriptionType": {
        "description": "Subscription type",
        "type": ["string", "null"],
        "maxLength": 100
    },
    "subscriptionPlan": {
        "description": "Subscription plan",
        "type": ["string", "null"],
        "maxLength": 100
    },
    "paymentMethod": {
        "description": "Payment type",
        "type": ["string", "null"],
        "maxLength": 100
    },
    "offer": {
        "description": "Offer applied during subscription",
        "type": ["string", "null"],
        "maxLength": 100
    }
  },
  "required": ["subscriptionId", "purchaseValue", "subscriptionType", "subscriptionPlan", "paymentMethod"],
  "additionalProperties": false
}

And incoming event that is parsed by snowplow to such values:

"data":{"subscriptionPlan":"1","purchaseValue":14.95,"subscriptionId":1234,"subscriptionType":"order","paymentMethod":"Card","couponValue":0.0,"offer":"order"}

Redshift table DDL generated accroding to schema:

create table com.custom_subscription_purchase_1
(
	schema_vendor varchar(128) not null encode runlength,
	schema_name varchar(128) not null encode runlength,
	schema_format varchar(128) not null encode runlength,
	schema_version varchar(128) not null encode runlength,
	root_id char(36) not null distkey
		constraint com.custom_subscription_purchase_1_root_id_fkey
			references events,
	root_tstamp timestamp not null,
	ref_root varchar(255) not null encode runlength,
	ref_tree varchar(1500) not null encode runlength,
	ref_parent varchar(255) not null encode runlength,
	payment_method varchar(100),
	coupon varchar(255),
	coupon_value double precision,
	offer varchar(100),
	purchase_value double precision,
	subscription_id bigint,
	subscription_plan varchar(100),
	subscription_type varchar(100)
)
diststyle key
sortkey(root_tstamp)
;

During upload to redshift I get such error regarding field coupon_value :

Invalid digit, Value 'o', Pos 0, Type: Double                                                       

What is going on? Am I correct that for some reason start of next field name (offer) considered as first digit of couponValue? Should I convert double fields to varchar and then cast it on database level?


#2

@sphinks, how does your JSONPaths file look like? The order of jsonpaths have to match the order of your table columns. Take a look at this post explaining the significance of jsonpaths.

The error suggests that either subscriptionType or offer (both of which have a value starting with o) are attempted to be loaded into a column defined as double precision (either coupon_value or purchase_value). That is the jsonpaths do not follow the same order as the columns in your table definition.

Make sure the JSONPaths file reflects both the number of columns of your table and their order.


#3

@ihor jsonpath looks like this:

{
    "jsonpaths": [
        "$.schema.vendor",
        "$.schema.name",
        "$.schema.format",
        "$.schema.version",
        "$.hierarchy.rootId",
        "$.hierarchy.rootTstamp",
        "$.hierarchy.refRoot",
        "$.hierarchy.refTree",
        "$.hierarchy.refParent",
        "$.data.coupon",
        "$.data.couponValue",
        "$.data.offer",
        "$.data.paymentMethod",
        "$.data.purchaseValue",
        "$.data.subscriptionId",
        "$.data.subscriptionPlan",
        "$.data.subscriptionType"
    ]
}

Thanks, I will dig into the document.


#4

I see misposition in DDL and JSON path, seem like it is cause due manual editing of DDL. Issue resolved, thanks @ihor.