Serializable isolation violation on table


#1

so this is the second time this has happened and very hard to get snow plow ETL going again. We found out why it happened and will make sure that other process doesn’t run at the same time as load. Last time this happened, i tried running StorageLoader (we are on R90 of course) and it would complete the load with no errors. When i tried running the whole process, it would keep complaining about StorageLoader wanting to be run again. I tried all the usual flag combos (–skip staging etc) and nothing got it going again. i had to resort to cleaning up enriched/shredded folders and have it re-process the raw logs again. but the records still got loaded into redshift but with unique event ids and unique event finger print ids. so makes it hard to de-duplicate. anyone have any suggestions on the cleanest way to get this going again without doing what i had to do above? Thanks everyone.

ERROR: Data loading error Amazon Invalid operation: 1023
Details:
Serializable isolation violation on table - 4237826, transactions forming the cycle are: 41525643, 41526274 (pid:10734);
Following steps completed: [Discover]
INFO: Logs successfully dumped to S3 [s3://ga-snowplow-production/snowplow-log/rdb-loader/2017-09-28-14-07-04/5c29862e-ee7c-45c6-90ef-38d78eed49f6]

This article was very helpful of course to find the culprit and why it happened:


#2

Hello @mjensen,

Before we dive deeper into an issue, could you please clarify one thing about toolset. You said that you’re on R90, while still mention StorageLoader. R90 deprecated StorageLoader and replaced it with RDB Loader, which is now launching automatically as part of EMR process, so you don’t need to launch anything manually.

For me this error means that you’re trying to load in some table in two concurrent transactions. I might be wrong, of course, but just in case can you confirm that:

  • You’re using only one load tool. Preferably RDB Loader 0.13
  • You don’t have any other transactions started. For example in console or from another loading tool running concurrently

#3

@anton yes sorry, not using StorageLoader, i’m just used to calling that process StorageLoader :slight_smile: i’m of course using EmrEtlRunner to kick off just the load portion when it complains.

i’m on 0.12 though. i can look at upgrading to latest version if you think it will help. I see R92 is out so probably a good idea for us to upgrade today? https://snowplowanalytics.com/blog/2017/09/11/snowplow-r92-maiden-castle-released-improving-emretlrunner/

storage:
versions:
rdb_loader: 0.12.0
rdb_shredder: 0.12.0
hadoop_elasticsearch: 0.1.0

how i kick off just loader:
[root@ip-XXX-XXX-XXX-XXX bin]# /var/app/current/bin/snowplow-emr-etl-runner --config /var/app/current/etc/emr-config.yml --resolver /var/app/current/etc/resolver.conf --enrichments /var/app/current/enrichments --targets /var/app/current/etc/storage_targets --skip staging,enrich,shred,elasticsearch,archive_raw,analyze,archive_enriched
D, [2017-09-28T17:45:46.826000 #9908] DEBUG – : Initializing EMR jobflow
D, [2017-09-28T17:45:50.591000 #9908] DEBUG – : EMR jobflow j-1GNVN1ZOZIJHR started, waiting for jobflow to complete…
I, [2017-09-28T17:45:50.602000 #9908] INFO – : SnowplowTracker::Emitter initialized with endpoint http://sp.generalassemb.ly:80/i
I, [2017-09-28T17:45:51.643000 #9908] INFO – : Attempting to send 1 request
I, [2017-09-28T17:45:51.662000 #9908] INFO – : Sending GET request to http://sp.generalassemb.ly:80/i
I, [2017-09-28T17:45:51.677000 #9908] INFO – : GET request to http://sp.generalassemb.ly:80/i finished with status code 200
I, [2017-09-28T17:55:54.422000 #9908] INFO – : RDB Loader logs
D, [2017-09-28T17:55:55.034000 #9908] DEBUG – : Downloading s3://ga-snowplow-production/snowplow-log/rdb-loader/2017-09-28-17-45-46/3b7a015b-7a3d-4d57-a71e-ef3079546b6d to /var/app/current/bin/rdbloader20170928-9908-1e5l8h
I, [2017-09-28T17:55:59.028000 #9908] INFO – : Data Warehouse
I, [2017-09-28T17:55:59.028000 #9908] INFO – : RDB Loader successfully completed following steps: [Discover, Load]
D, [2017-09-28T17:55:59.029000 #9908] DEBUG – : EMR jobflow j-1GNVN1ZOZIJHR completed successfully.
I, [2017-09-28T17:55:59.657000 #9908] INFO – : Attempting to send 1 request
I, [2017-09-28T17:55:59.665000 #9908] INFO – : Sending GET request to http://sp.generalassemb.ly:80/i
I, [2017-09-28T17:55:59.680000 #9908] INFO – : GET request to http://sp.generalassemb.ly:80/i finished with status code 200
I, [2017-09-28T17:55:59.683000 #9908] INFO – : Completed successfully

and this is what i keep getting even after running Loader:
Error running EmrEtlRunner, exiting with return code 4. StorageLoader not run


#4

looking at the R92 release notes, i think we were affected by that bug mentioned in the article below and probably what is causing some issues for me now as well.

dev=> SELECT * FROM atomic.manifest WHERE etl_tstamp BETWEEN ‘2017-07-27’ AND ‘2017-09-29’ AND shredded_cardinality = 0;
etl_tstamp | commit_tstamp | event_count | shredded_cardinality
-------------------------±---------------------------±------------±---------------------
2017-08-08 19:03:24.783 | 2017-08-08 21:56:40.57218 | 673279 | 0
2017-08-08 19:03:24.783 | 2017-08-08 23:20:50.459025 | 1346558 | 0
2017-08-08 19:03:24.783 | 2017-08-09 15:12:48.290994 | 2019837 | 0
2017-09-13 00:07:13.448 | 2017-09-13 13:49:03.065501 | 16718 | 0
2017-09-13 00:07:13.448 | 2017-09-13 14:23:31.142434 | 25077 | 0
2017-09-13 00:07:13.448 | 2017-09-13 14:39:53.088461 | 33436 | 0
2017-09-13 00:07:13.448 | 2017-09-13 15:22:49.999815 | 41795 | 0
2017-09-13 00:07:13.448 | 2017-09-13 15:38:42.900336 | 50154 | 0
2017-09-13 00:07:13.448 | 2017-09-13 15:51:55.204104 | 58513 | 0
2017-09-28 14:07:04.974 | 2017-09-28 17:27:55.663084 | 142461 | 0
2017-09-28 14:07:04.974 | 2017-09-28 17:53:53.1087 | 284922 | 0
(11 rows)


#5

Sorry to hear about @mjensen! Please let me know if you need any help with recovering or clarification regarding that post.

It’s not entirely clear to me how it could result in “Serializable isolation violation on table”, but it’s generally a very good idea to resolve this issue first and update to R92.


#7

@anton thanks, upgraded and trying to recover now. i’ll let you know.
liking the new “–resume-from” flag :slight_smile:


#8

@anton so none of the resume flags worked so i went back to just cleaning out enriched and shredded good folders and starting from scratch. but now i’m getting this and have no clue why. both shredded and enriched good folders are completely empty.

/var/app/current/bin/snowplow-emr-etl-runner run --config /var/app/current/etc/emr-config.yml --resolver /var/app/current/etc/resolver.conf --enrichments /var/app/current/enrichments --targets /var/app/current/etc/storage_targets

D, [2017-09-29T11:30:39.453000 #6943] DEBUG – : Initializing EMR jobflow
There seems to be an ongoing run of EmrEtlRunner: Cannot safely add enrichment step to jobflow, s3://ga-snowplow-production/snowplow-enriched/good/ is not empty
Error running EmrEtlRunner, exiting with return code 4. StorageLoader not run



#9

@mjensen what was period between you cleaned up buckets and launched EmrEtlRunner? I hope it was just a short S3 eventual consistency issue. Could you retry again in 10-30 mins? Or having that S3 console gives you no keys - you can try it right now.


#10

i know there is a period of time i shouldn’t trust s3 folder/files are up to date so i waited 5 minutes before kicking it off after cleaning up. i’ll try again in 30 mins.


#12

@anton quick update. we are good. we had another process updating that folder by mistake which was throwing snow plow off. we’ve moved that process to it’s own folder from now on. thanks for all your help.