Re: Re: Opening file and checking for data

I haven't done benchmarking, either, but I've seen them in the past. Slurping is by far faster than processing line-by-line. Reason is that doing I/O in a single operation is faster than doing a bit at a time, since you don't have to worry about things like resetting the drive head to the correct position.

Naturally, you have to worry about memory limitations. There is really a limited number of cases when slurping is worth it. If your file is small enough to fit in memory, you won't see much speed gain. If it's too large, you'll end up swapping to the hard disk and will thus lose any benifits from slurping.

----
I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
-- Schemer

Note: All code is untested, unless otherwise stated

Comment on Re: Re: Opening file and checking for data

Replies are listed 'Best First'.
Re: Re: Re: Opening file and checking for data by sgifford (Prior) on Jul 03, 2003 at 19:20 UTC
Perl's I/O is buffered, so it does one I/O for every disk block regardless of which method you use. `#!/usr/bin/perl if ($ARGV[0] eq 'line') { print "Line-at-a-time\n"; while (<STDIN>) { print if /perl/; } } else { print "All at once\n"; my @arr = (<STDIN>); foreach (@arr) { print if /perl/; } }` [download] On my system, the block size is 4096 bytes. On an 8K file with 128 lines, we see: $ strace -e read /tmp/t29 line </tmp/t29.8192 >/dev/null ... read(0, "This is a line that contains the"..., 4096) = 4096 read(0, "This is a line that contains the"..., 4096) = 4096 read(0, "", 4096) = 0 $ strace -e read /tmp/t29 slurp </tmp/t29.8192 >/dev/null ... read(0, "This is a line that contains the"..., 4096) = 4096 read(0, "This is a line that contains the"..., 4096) = 4096 read(0, "", 4096) = 0 Still, the diamond operator takes some time to operate, so slurping is still probably faster, but not because of I/O.	[reply] [d/l]
Re: Re: Re: Opening file and checking for data by CountZero (Bishop) on Jul 03, 2003 at 16:08 UTC
And of course the caching algoritms of your hard-drive and OS will have a big influence as well. If the file is small enough to be read in one go, slurping will not have any big added speed-benefit, but will increase memory load. And if the file is rather large, it may crowd out other items in your cache and slow down other programs: TANSTAAFL, as Heinlein said. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]