Kinesis S3 files not exceeding 30MB each


#1

Hey Snowplowers, we are running our Kinesis S3 Sync 0.4.0 and have started increasing load and noticed that our lzo files in S3 are not increasing based off config changes.

Example: we had:

   byte-limit: 128000000 # 128MB
   record-limit: 100000 # 100k records
   time-limit: 7200000 # 2 hours

and were getting roughly 30MB files dropped to S3. I increased the record limit (assuming based off frequency and file size this was the limit being hit) to the following:

   byte-limit: 128000000 # 128MB
   record-limit: 400000 # 400k records
   time-limit: 7200000 # 2 hours`

but still getting 30MB files.

Has anyone else had this issue, or anywhere i can look? The S3 Sync server is no where near hitting peak, the kinesis is only operating at 1/5 of its limit, and the dynamo table looks fine.

Any help is greatly appreciated,

Thanks!
-David

Whole config file (with personal data withheld)

    sink {
      aws {
        access-key: "iam"
        secret-key: "iam"
      }
      kinesis {
        in {
          stream-name: "STREAM NAME HERE" 
          initial-position: "TRIM_HORIZON"
          max-records: "100"
        }
        out {
          stream-name: "STREAM NAME HERE" 
          shards: "1"
        }
        region: "REGION HERE"
        app-name: "STREAM NAME HERE"
      }
      s3 {
        region: "REGION HERE"
        endpoint: "http://s3-REGION HERE.s3.amazonaws.com"
        bucket: "BUCKET NAME"
        format: "lzo"
        max-timeout: "30000"
      }
      buffer {
        byte-limit: 128000000 # 128MB 
        record-limit: 400000 # 400k records
        time-limit: 7200000 # 2 hours
      }
      logging {
        level: "ERROR"
      }
    }

#2

Hi @13scoobie,

That’s super odd:

  • You’re clearly not hitting the 128Mb threshold
  • I doubt you are hitting the 2 hour (!) threshold
  • If you were hitting the record threshold (100,000), then that would imply your events are only 300 bytes each (30Mb / 100,000), which is out by a factor of 10

Can you share a listing from your S3 bucket with timestamps so we can get a sense of how many nodes you are running and how frequently these files are arriving in S3.

Thanks


#3

Certainly, we ran batch not to long ago, so here are the raw files since the last batch:

2016-06-30 12:00:27   35417753 2016-06-30-49560854836580187887447024381070193992988435592526168130-49560854836580187887447024824997426362037286595662446658.lzo
2016-06-30 12:00:31       4224 2016-06-30-49560854836580187887447024381070193992988435592526168130-49560854836580187887447024824997426362037286595662446658.lzo.index
2016-06-30 12:01:15   35503855 2016-06-30-49560855358216918826273542476096764762438383403932844082-49560855358216918826273542523127605922906384169862955058.lzo
2016-06-30 12:01:19       4224 2016-06-30-49560855358216918826273542476096764762438383403932844082-49560855358216918826273542523127605922906384169862955058.lzo.index
2016-06-30 12:01:33   35334499 2016-06-30-49560854822954432571144813957023586056429788434263965714-49560854822954432571144814362845476591225515412436287506.lzo
2016-06-30 12:01:36       4216 2016-06-30-49560854822954432571144813957023586056429788434263965714-49560854822954432571144814362845476591225515412436287506.lzo.index
2016-06-30 12:07:24   35363993 2016-06-30-49560854847908966448300582889946479133421522772266319874-49560854847908966448300583294440969118280378945151959042.lzo
2016-06-30 12:07:27       4224 2016-06-30-49560854847908966448300582889946479133421522772266319874-49560854847908966448300583294440969118280378945151959042.lzo.index
2016-06-30 12:08:07   35412739 2016-06-30-49561016824153341599026057090934683914156040887168139298-49561016824153341599026058646892331455721514604280414242.lzo
2016-06-30 12:08:09       4216 2016-06-30-49561016824153341599026057090934683914156040887168139298-49561016824153341599026058646892331455721514604280414242.lzo.index
2016-06-30 12:09:06   35371894 2016-06-30-49560854836580187887447024824998635287856901224837152834-49560854836580187887447025236752724819501612532500529218.lzo
2016-06-30 12:09:11       4224 2016-06-30-49560854836580187887447024824998635287856901224837152834-49560854836580187887447025236752724819501612532500529218.lzo.index
2016-06-30 12:09:17   35433727 2016-06-30-49560855358216918826273542523128814848725998799037661234-49560855358216918826273542569674876755528530723289956402.lzo
2016-06-30 12:09:19       4216 2016-06-30-49560855358216918826273542523128814848725998799037661234-49560855358216918826273542569674876755528530723289956402.lzo.index
2016-06-30 12:10:38   35465376 2016-06-30-49560854822954432571144814362846685517045130041610993682-49560854822954432571144814775705733247817610971846803474.lzo
2016-06-30 12:10:41       4224 2016-06-30-49560854822954432571144814362846685517045130041610993682-49560854822954432571144814775705733247817610971846803474.lzo.index
2016-06-30 12:16:22   35333729 2016-06-30-49560854847908966448300583294442178044099993574326665218-49560854847908966448300583691201960635064449950979981314.lzo
2016-06-30 12:16:23   35216347 2016-06-30-49561016824153341599026058646893540381541129233455120418-49561016824153341599026060286120789452342648830091591714.lzo
2016-06-30 12:16:26       4224 2016-06-30-49561016824153341599026058646893540381541129233455120418-49561016824153341599026060286120789452342648830091591714.lzo.index
2016-06-30 12:16:28       4224 2016-06-30-49560854847908966448300583294442178044099993574326665218-49560854847908966448300583691201960635064449950979981314.lzo.index
2016-06-30 12:17:31   35436046 2016-06-30-49560854836580187887447025236753933745321227161675235394-49560854836580187887447025641312496798619662460391522370.lzo
2016-06-30 12:17:32   35338527 2016-06-30-49560855358216918826273542569676085681348145352464662578-49560855358216918826273542616345458021751368215586406450.lzo
2016-06-30 12:17:35       4216 2016-06-30-49560855358216918826273542569676085681348145352464662578-49560855358216918826273542616345458021751368215586406450.lzo.index
2016-06-30 12:17:36       4224 2016-06-30-49560854836580187887447025236753933745321227161675235394-49560854836580187887447025641312496798619662460391522370.lzo.index
2016-06-30 12:19:19   35430190 2016-06-30-49560854822954432571144814775706942173637225601021509650-49560854822954432571144815168092831149235956990842241042.lzo
2016-06-30 12:19:24       4224 2016-06-30-49560854822954432571144814775706942173637225601021509650-49560854822954432571144815168092831149235956990842241042.lzo.index
2016-06-30 12:23:53   35273753 2016-06-30-49561016824153341599026060286121998378162263459266297890-49561016824153341599026061846937109862939315551080546338.lzo
2016-06-30 12:23:56       4224 2016-06-30-49561016824153341599026060286121998378162263459266297890-49561016824153341599026061846937109862939315551080546338.lzo.index
2016-06-30 12:24:06   35376353 2016-06-30-49560854847908966448300583691203169560884064580154687490-49560854847908966448300584094486315874489058884010901506.lzo
2016-06-30 12:24:10       4224 2016-06-30-49560854847908966448300583691203169560884064580154687490-49560854847908966448300584094486315874489058884010901506.lzo.index
2016-06-30 12:24:54   35346568 2016-06-30-49560855358216918826273542616346666947570982844761112626-49560855358216918826273542662897564557651972254920081458.lzo
2016-06-30 12:25:00       4216 2016-06-30-49560855358216918826273542616346666947570982844761112626-49560855358216918826273542662897564557651972254920081458.lzo.index
2016-06-30 12:25:31   35323348 2016-06-30-49560854836580187887447025641313705724439277089566228546-49560854836580187887447026026605617990539362832943677506.lzo
2016-06-30 12:25:34       4224 2016-06-30-49560854836580187887447025641313705724439277089566228546-49560854836580187887447026026605617990539362832943677506.lzo.index
2016-06-30 12:28:13   35404615 2016-06-30-49560854822954432571144815168094040075055571620016947218-49560854822954432571144815541683550407286042554192101394.lzo
2016-06-30 12:28:15       4216 2016-06-30-49560854822954432571144815168094040075055571620016947218-49560854822954432571144815541683550407286042554192101394.lzo.index
2016-06-30 12:32:22   35072908 2016-06-30-49561016824153341599026061846938318788758930180255252514-49561016824153341599026063527108258592449098785444855842.lzo
2016-06-30 12:32:27       4224 2016-06-30-49561016824153341599026061846938318788758930180255252514-49561016824153341599026063527108258592449098785444855842.lzo.index
2016-06-30 12:33:29   35478230 2016-06-30-49560854847908966448300584094487524800308673513185607682-49560854847908966448300584508652212416264944400096821250.lzo
2016-06-30 12:33:32       4224 2016-06-30-49560854847908966448300584094487524800308673513185607682-49560854847908966448300584508652212416264944400096821250.lzo.index
2016-06-30 12:33:44   35430688 2016-06-30-49560855358216918826273542662898773483471586884094787634-49560855358216918826273542709206677003810033768553513010.lzo
2016-06-30 12:33:48       4224 2016-06-30-49560855358216918826273542662898773483471586884094787634-49560855358216918826273542709206677003810033768553513010.lzo.index
2016-06-30 12:34:40   35407743 2016-06-30-49560854836580187887447026026606826916358977462118383682-49560854836580187887447026408569357475240370884942233666.lzo
2016-06-30 12:34:43       4224 2016-06-30-49560854836580187887447026026606826916358977462118383682-49560854836580187887447026408569357475240370884942233666.lzo.index
2016-06-30 12:37:53   35428302 2016-06-30-49560854822954432571144815541684759333105657183366807570-49560854822954432571144815936587631865142043353829867538.lzo
2016-06-30 12:37:57       4224 2016-06-30-49560854822954432571144815541684759333105657183366807570-49560854822954432571144815936587631865142043353829867538.lzo.index
2016-06-30 12:40:11   35337675 2016-06-30-49561016824153341599026063527115512147366786560493092898-49561016824153341599026065108285307656995351578663714850.lzo
2016-06-30 12:40:15       4216 2016-06-30-49561016824153341599026063527115512147366786560493092898-49561016824153341599026065108285307656995351578663714850.lzo.index
2016-06-30 12:42:03   35386272 2016-06-30-49560855358216918826273542709207885929629648397728219186-49560855358216918826273542755886929676589791936379879474.lzo
2016-06-30 12:42:06       4224 2016-06-30-49560855358216918826273542709207885929629648397728219186-49560855358216918826273542755886929676589791936379879474.lzo.index
2016-06-30 12:42:11   35337749 2016-06-30-49560854847908966448300584508653421342084559029271527426-49560854847908966448300584905531678663371245354194305026.lzo
2016-06-30 12:42:17       4224 2016-06-30-49560854847908966448300584508653421342084559029271527426-49560854847908966448300584905531678663371245354194305026.lzo.index
2016-06-30 12:43:15   35311160 2016-06-30-49560854836580187887447026408570566401059985514116939842-49560854836580187887447026762695162111675307021106675778.lzo
2016-06-30 12:43:17       4224 2016-06-30-49560854836580187887447026408570566401059985514116939842-49560854836580187887447026762695162111675307021106675778.lzo.index
2016-06-30 12:47:15   35358115 2016-06-30-49560854822954432571144815936588840790961657983004573714-49560854822954432571144816328178047651434344486086377490.lzo
2016-06-30 12:47:19       4216 2016-06-30-49560854822954432571144815936588840790961657983004573714-49560854822954432571144816328178047651434344486086377490.lzo.index
2016-06-30 12:48:58   35441152 2016-06-30-49561016824153341599026065108286516582814966207838421026-49561016824153341599026066719678248656989645483452399650.lzo
2016-06-30 12:49:02       4224 2016-06-30-49561016824153341599026065108286516582814966207838421026-49561016824153341599026066719678248656989645483452399650.lzo.index
2016-06-30 12:50:57   35183080 2016-06-30-49560855358216918826273542755888138602409406565554585650-49560855358216918826273542802562346646091091862385328178.lzo
2016-06-30 12:50:59       4216 2016-06-30-49560855358216918826273542755888138602409406565554585650-49560855358216918826273542802562346646091091862385328178.lzo.index
2016-06-30 12:51:22   35376195 2016-06-30-49560854847908966448300584905532887589190859983369011202-49560854847908966448300585335423282266694229854917754882.lzo
2016-06-30 12:51:27       4224 2016-06-30-49560854847908966448300584905532887589190859983369011202-49560854847908966448300585335423282266694229854917754882.lzo.index
2016-06-30 12:52:14   35253382 2016-06-30-49560854836580187887447026762696371037494921650281381954-49560854836580187887447027143651866388637327733624406082.lzo
2016-06-30 12:52:18       4216 2016-06-30-49560854836580187887447026762696371037494921650281381954-49560854836580187887447027143651866388637327733624406082.lzo.index
2016-06-30 12:56:43   35368182 2016-06-30-49560854822954432571144816328179256577253959115261083666-49560854822954432571144816731848050227316021362799149074.lzo
2016-06-30 12:56:46       4224 2016-06-30-49560854822954432571144816328179256577253959115261083666-49560854822954432571144816731848050227316021362799149074.lzo.index
2016-06-30 12:56:51   35325571 2016-06-30-49561016824153341599026066719679457582809260112627105826-49561016824153341599026068377919483018690051434197549090.lzo
2016-06-30 12:56:54       4216 2016-06-30-49561016824153341599026066719679457582809260112627105826-49561016824153341599026068377919483018690051434197549090.lzo.index
2016-06-30 12:59:48   35392943 2016-06-30-49560855358216918826273542802563555571910706491560034354-49560855358216918826273542849232927912313930041876545586.lzo
2016-06-30 12:59:51       4216 2016-06-30-49560855358216918826273542802563555571910706491560034354-49560855358216918826273542849232927912313930041876545586.lzo.index
2016-06-30 13:00:46   35519348 2016-06-30-49560854847908966448300585335424491192513844484092461058-49560854847908966448300585761362907365696991858404622338.lzo
2016-06-30 13:00:51       4224 2016-06-30-49560854847908966448300585335424491192513844484092461058-49560854847908966448300585761362907365696991858404622338.lzo.index
2016-06-30 13:01:10   35184217 2016-06-30-49560854836580187887447027143653075314456942362799112258-49560854836580187887447027534617267526188862902498230338.lzo
2016-06-30 13:01:14       4224 2016-06-30-49560854836580187887447027143653075314456942362799112258-49560854836580187887447027534617267526188862902498230338.lzo.index
2016-06-30 13:06:23   35356104 2016-06-30-49561016824153341599026068377920691944509666063372255266-49561016824153341599026070118990269911286780031580241954.lzo
2016-06-30 13:06:27       4216 2016-06-30-49561016824153341599026068377920691944509666063372255266-49561016824153341599026070118990269911286780031580241954.lzo.index
2016-06-30 13:07:17   35247784 2016-06-30-49560854822954432571144816731849259153135635991973855250-49560854822954432571144817127408578405222760375463706642.lzo
2016-06-30 13:07:21       4216 2016-06-30-49560854822954432571144816731849259153135635991973855250-49560854822954432571144817127408578405222760375463706642.lzo.index
2016-06-30 13:09:44   35531377 2016-06-30-49560855358216918826273542849234136838133544671051251762-49560855358216918826273542896151338971557769607364214834.lzo
2016-06-30 13:09:49       4224 2016-06-30-49560855358216918826273542849234136838133544671051251762-49560855358216918826273542896151338971557769607364214834.lzo.index
2016-06-30 13:10:49   35279954 2016-06-30-49560854836580187887447027534618476452008477531672936514-49560854836580187887447027938967895338513572362438312002.lzo
2016-06-30 13:10:54       4224 2016-06-30-49560854836580187887447027534618476452008477531672936514-49560854836580187887447027938967895338513572362438312002.lzo.index
2016-06-30 13:11:08   35247785 2016-06-30-49560854847908966448300585761364116291516606487579328514-49560854847908966448300586165672431700154805713111089154.lzo
2016-06-30 13:11:11       4216 2016-06-30-49560854847908966448300585761364116291516606487579328514-49560854847908966448300586165672431700154805713111089154.lzo.index
2016-06-30 13:15:26   35360756 2016-06-30-49561016824153341599026070118991478837106394660754948130-49561016824153341599026071768737591464374801125961891874.lzo
2016-06-30 13:15:29       4224 2016-06-30-49561016824153341599026070118991478837106394660754948130-49561016824153341599026071768737591464374801125961891874.lzo.index
2016-06-30 13:17:28   35239622 2016-06-30-49560854822954432571144817127409787331042375004638412818-49560854822954432571144817526841295983355158352699064338.lzo
2016-06-30 13:17:34       4216 2016-06-30-49560854822954432571144817127409787331042375004638412818-49560854822954432571144817526841295983355158352699064338.lzo.index
2016-06-30 13:18:35   35246510 2016-06-30-49560855358216918826273542896152547897377384236538921010-49560855358216918826273542942821920237780607580697002034.lzo
2016-06-30 13:18:39       4216 2016-06-30-49560855358216918826273542896152547897377384236538921010-49560855358216918826273542942821920237780607580697002034.lzo.index
2016-06-30 13:19:25   35252145 2016-06-30-49560854836580187887447027938969104264333186991613018178-49560854836580187887447028339261368100211586861820280898.lzo
2016-06-30 13:19:29       4224 2016-06-30-49560854836580187887447027938969104264333186991613018178-49560854836580187887447028339261368100211586861820280898.lzo.index
2016-06-30 13:20:25   35277829 2016-06-30-49560854847908966448300586165673640625974420342285795330-49560854847908966448300586574313537246291838993166237698.lzo
2016-06-30 13:20:39       4224 2016-06-30-49560854847908966448300586165673640625974420342285795330-49560854847908966448300586574313537246291838993166237698.lzo.index
2016-06-30 13:24:06   35185252 2016-06-30-49561016824153341599026071768738800390194415755136598050-49561016824153341599026073296979405665455282398030725154.lzo
2016-06-30 13:24:09       4216 2016-06-30-49561016824153341599026071768738800390194415755136598050-49561016824153341599026073296979405665455282398030725154.lzo.index

#4

Thanks @13scoobie - based on the bunching it looks like you might have 4 shards in your Kinesis stream - is that right?


#5

We have 5 shards on our input, 3 on our enrichment right now.


#6

So guessing now that you pointed that out, we have 128MB size, 4 32MB chunks, would match up to roughly 128. I just did not think it about the split across shards dumping when total stream hit 128MB.

Is that correct?


#7

Hi @13scoobie - interesting! Are you running Kinesis S3 on a single box (presumably one with 4 or more vCPUs)?


#8

yes @alex - an m4.large right now. I did bump up the config to 512MB, and looks like it pinged the server at 100% and never wrote anything to s3. Dropped it back down to 256MB now, trying to find the sweet spot for good file size and performance.


#9

Hey @13scoobie - yes, generally you need to be a bit careful with making the buffer size too large because the whole batch needs to be kept in memory before writing and it can easily blow the Java Heap. (You can certainly tweak the Heap on Java startup to get the optimal buffer size.)

It feels like a bug if the buffer limits are shared across the number of KCL workers within a given instance of Kinesis S3. But the ‘correct’ behavior would be problematic as well, because it means that the memory consumption of the Kinesis S3 app is technically unbounded - because if the KCL decided to attempt to process say 3 shards then we would be putting 3 shards worth of the buffer limit into Heap. So the Heap requirement becomes unpredictable.

This is an example of how the KCL by operating at the level of a server is problematic. Scheduling something at the level of a server (rather than container-style at the level of a single (virtual) core) makes behavior very hard to reason about. We are planning on changing this in the future - basically shipping our Kinesis apps container-ized with the goal of boxing each KCL to working on a single shard.

I’ve created a ticket to explore this further: https://github.com/snowplow/snowplow/issues/2761


#10

Hello,

I recently noticed the same thing with our events, but came to the conclusion that the 30 mb files you see are compressed (lzo) files and the byte limit configuration is for the uncompressed data. If you decompress a file, it will probably be 128 mb?