in reply to Re: Thrashing on very large lines
in thread Thrashing on very large lines
There was one bug when $buf was overwritten by the read() command. I've cleaned it up until it sorts the same number of records as the one-liner.
scalar @ARGV == 3 or die "usage: this.pl <suspect.txt> <good.txt> <bad.txt>\n"; open(my $fh_in, '<', shift) or die("Unable to open input file: $!\n"); open(my $fh_good, '>', shift) or die("Unable to open 'good' output file: $!\n"); open(my $fh_bad, '>', shift) or die("Unable to open 'bad' output file: $!\n"); binmode($fh_in); binmode($fh_good); binmode($fh_bad); my $buf = ''; # stores the working buffer my $newbuf = ''; # used for reading next chunk my $is_continued = 0; # remembers that the current record is longer th +an it appears local $/ = "\n"; # Because we're using binmode. my $rec_len = 1000 + length($/); # Working buffer size can be up to $blk_size + $rec_len bytes. my $blk_size = 8192; for (;;) { my $read = read($fh_in, $newbuf, $blk_size); defined $read or die("Unable to read input file: $!\n"); $buf .= $newbuf; # if it didn't read anything new, flush the buffer and end if ($read == 0) { print $fh_bad $buf if length $buf; last; } while (length($buf) >= $rec_len) { my $pos = index($buf, $/); if ($pos < 0) { # no line ending, but long enough print $fh_bad $buf; $buf = ''; # flush, ends while $is_continued = 1; } else { # line ending found $pos += length($/); my $is_bad = $is_continued || $pos != $rec_len; print {$is_bad ? $fh_bad : $fh_good} (substr($buf, 0, $pos)); substr($buf, 0, $pos, ''); # clip written section $is_continued = 0; # reset whenever line ending found } } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Thrashing on very large lines
by ikegami (Patriarch) on Apr 20, 2006 at 21:48 UTC | |
by chr1so (Acolyte) on Apr 21, 2006 at 00:47 UTC |