I haven't done benchmarking, either, but I've seen them in the past. Slurping is by far faster than processing line-by-line. Reason is that doing I/O in a single operation is faster than doing a bit at a time, since you don't have to worry about things like resetting the drive head to the correct position.
Naturally, you have to worry about memory limitations. There is really a limited number of cases when slurping is worth it. If your file is small enough to fit in memory, you won't see much speed gain. If it's too large, you'll end up swapping to the hard disk and will thus lose any benifits from slurping.
---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
-- Schemer
Note: All code is untested, unless otherwise stated
| [reply] |
Perl's I/O is buffered, so it does one I/O for every disk block regardless of which method you use.
#!/usr/bin/perl
if ($ARGV[0] eq 'line')
{
print "Line-at-a-time\n";
while (<STDIN>)
{
print
if /perl/;
}
}
else
{
print "All at once\n";
my @arr = (<STDIN>);
foreach (@arr)
{
print
if /perl/;
}
}
On my system, the block size is 4096 bytes. On an 8K file with 128 lines, we see:
$ strace -e read /tmp/t29 line </tmp/t29.8192 >/dev/null
...
read(0, "This is a line that contains the"..., 4096) = 4096
read(0, "This is a line that contains the"..., 4096) = 4096
read(0, "", 4096) = 0
$ strace -e read /tmp/t29 slurp </tmp/t29.8192 >/dev/null
...
read(0, "This is a line that contains the"..., 4096) = 4096
read(0, "This is a line that contains the"..., 4096) = 4096
read(0, "", 4096) = 0
Still, the diamond operator takes some time to operate, so slurping is still probably faster, but not because of I/O.
| [reply] [d/l] |
And of course the caching algoritms of your hard-drive and OS will have a big influence as well. If the file is small enough to be read in one go, slurping will not have any big added speed-benefit, but will increase memory load. And if the file is rather large, it may crowd out other items in your cache and slow down other programs: TANSTAAFL, as Heinlein said. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |