Derived_tstamp is negative


#1

Hi,

We’ve got a relative new snowplow deployment (2 weeks in prod) close to out of the box config with scala collector -> kinesis -> s3 sink -> emr enrich and shred -> load to redshift.

We’ve had a couple of instances where the redshift load fails. It fails because of an invalid negative value for derived_tstamp.

e.g. derived_tstamp=-5967-09-11 11:10:28.569

Can someone tell me why this might of happened and how to stop it happening?


#2

Hi @lionelport - thanks for raising this. The design of our derived_tstamp is explained in the blog post, Improving-Snowplow’s Understanding of Time.

For one of the events which is failing to load, could you share:

  • device_created_tstamp
  • device_sent_tstamp
  • collector_tstamp

Thanks!


#3

Hi @alex, the timestamps are below. Looks like the device sent timestamp is bizarrely out of whack with the device created timestamp. Not sure why that would be, the browser user-agent looks like chrome on win7.

etl_tstamp=2016-07-24 19:07:23.018
collector_tstamp=2016-07-24 17:16:27.024
dvce_created_tstamp=2016-07-19 22:08:14.652
dvce_sent_tstamp=9999-06-02 04:14:13.107


#4

That’s interesting - the javascript library just uses new Date().getTime() to set dvce_sent_tstamp (stm) which should just be an epoch. Which version of the Javascript library are you running?


#5

We’re running JS 2.6.2.

We’ve got hundreds of million of records in the event table and only 2 instances of this happening. So it doesn’t seem like a typical issue, possibly someone manually messy with us and I’d be happy to drop the record or set the derived_tstamp to the same as the collector_tstamp. The main issue is that one bad record that could be spoofed from the browser will break the etl job. Can we put some enforcement to make sure the derived tstamp is valid for loading in redshift?


#6

Hi @lionelport - currently all you can do is bump maxerror on the Redshift load configuration to a number like 10 or better 1000.


#7

Sorry for replying to this old post, but will there be a filter mechanism or something similar implemented in the future?
This would be really appreciated by my team :slight_smile: