I’m trying to setup a testing snowplow pipeline with JSTracker -> scala collector -> kinesis -> s3 -> redshift, with the emr-etl on version r92 and rdb loader 0.13.0
I have some (8) custom self describing event types. I used the igluctl to create the tables in redshift and also the jsonpath files (I had to rename the jsonpath files, because my events contain a dash, but all good now). My buffer limits are very low, because of testing purposes, but I have below 100 files under 50KB in both enriched/good/* and shredded/good/* folders.
Now finally the rdbloader doesn’t throw an error (I’m running with the flag -f rdb_load), but it’s running for 2 hours now and it’s still not finished.
I found this issue mentioned here: https://github.com/snowplow/snowplow-rdb-loader/issues/26
I wanted to ask, if I could be doing something wrong, or if this is normal? Does this time increase marginally with increased event numbers? What’s the average run time of this step for you guys? Are there any recommendations on how to reduce this running time? (It shouldn’t take that long for ~1000 events)