The performance on that may not be as bad as you think. I tried benchmarking my read-by-chunks solution against a change-the-input-record-separator-to-space solution. The latter makes the code much simpler, since the only special thing you have to watch for is the newlines. But it was also a bit quicker:
$ perl 1125570a.pl
Rate read_buffer change_irs
read_buffer 1.15/s -- -33%
change_irs 1.72/s 50% --
$ cat 1125570a.pl
#!/usr/bin/env perl
use Modern::Perl;
use Benchmark qw(:all);
# setup long multiline strings with lines ending in 0
my $line1 = join ' ', (map { int(rand()*100) } 1..1000000), 0;
$line1 =~ s/ 0 / 0\n/g;
my $line2 = $line1;
cmpthese( 10, {
'read_buffer' => \&read_buffer,
'change_irs' => \&change_irs,
});
sub read_buffer {
my $l; # chunk of a line
my $tiny_buffer = 1000000; # buffer size of chunks
my $leftover = ''; # leftover, possibly partial number at end o
+f buffer
open my $in, '<', \$line1;
while ( read $in, $l, $tiny_buffer ) {
$l = $leftover . $l;
# say " ;$l;";
$leftover = '';
if ( $l =~ s/(\d+)$//g ) {
if ( $1 == 0 ) {
$l .= '0';
$leftover = '';
} else {
$leftover = $1;
}
}
for (split ' ', $l) {
if ( $_ == 0 ) {
# say 'Reached a zero';
} else {
# say "; $_ ;"; # process a number
}
}
}
}
sub change_irs {
open my $in, '<', \$line2;
local $/ = ' ';
while ( <$in> ) {
# say " $_";
if ( $_ =~ /0\n(\d+)/ ) {
# say 'Reached a zero';
# say "; $1 ;"; # process a number
} elsif ( $_ == 0){
# say 'Reached a zero';
} else {
# say "; $_ ;"; # process a number
}
}
}
The larger the buffer you can use on the read_buffer solution, the faster it should be, I think, but I don't know if it would ever catch up to the $/=' ' solution. Considering how much clearer that one's code is, I think it wins.
EDIT: It also occurs to me that reading the file from disc might make a difference, if the RS=space solution causes more disc reads. I'd think OS buffering would prevent that, but I don't know for sure. You'd want to benchmark that with your actual situation.
Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.
|