Here is my (loosely tested) solution. The two layers of buffering turned out to be close to BrowserUK's description. The use of a separate write_invoice sub makes the separation of the two buffers explicit.
use strict; use warnings; use File::Slurp; use constant { CHUNK_LIMIT => 500 * 1024, EXCEPTION_LIMIT => 25 * 1024 * 1024, }; my $in_filename = 'infile.dat'; # Filename generators { my $filenumber = 0; sub next_output_filename { return sprintf('outfile_%03d.dat', $filenumber++); } } { my $filenumber = 0; sub next_special_filename { return sprintf('special_%03d.dat', $filenumber++); } } { my $outfile_buffer = ''; sub write_invoice { my ( $invoice, $force_flush ) = @_; my $invoice_len = length $invoice; my $buffer_len = length $outfile_buffer; # If the invoice is special, write it immediately to # a Special file, bypassing the $outfile_buffer queueing. if ( $invoice_len >= EXCEPTION_LIMIT ) { write_file( next_special_filename(), $invoice ); return 1; } # If the invoice would made the $outfile_buffer too # big, flush it. my $too_big = $buffer_len + $invoice_len >= CHUNK_LIMIT; if ( $too_big or $force_flush ) { if ( $buffer_len ) { write_file( next_output_filename(), $outfile_buffer ); $outfile_buffer = ''; } } # Store the invoice with the rest waiting to be # written to file. $outfile_buffer .= $invoice if $invoice; return 1; } } open my $in_fh, '<', $in_filename or die "Can't open '$in_filename': $!"; my $invoice = ''; while ( <$in_fh> ) { if ( length($_) >= 69 and substr( $_, 67, 2 ) eq '11' ) { write_invoice( $invoice ); $invoice = ''; } $invoice .= $_; } close $in_fh or warn "Can't close '$in_filename': $!"; write_invoice( $invoice ) if $invoice; write_invoice( '', 'FORCE' );

In reply to Re: Removing Large Invoices from a Data File by Util
in thread Removing Large Invoices from a Data File by JayDog

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.