RE: RE: RE: RE (tilly) 2 (blame): File reading efficiency and other surly remarks

I'd be very interested to see your results that show diffrently.

Cut, paste, copy Chatter.bat (33KB) to "file", run:

Benchmark: running BufferedFileHandle, chunk, linebyline, each for at 
+least 3 CPU seconds...
BufferedFileHandle:  4 wallclock secs ( 3.46 usr +  0.00 sys =  3.46 C
+PU) @ 386.13/s (n=1336)
     chunk:  4 wallclock secs ( 3.63 usr +  0.00 sys =  3.63 CPU) @ 31
+0.19/s (n=1126)
linebyline:  4 wallclock secs ( 3.40 usr +  0.00 sys =  3.40 CPU) @ 43
+4.71/s (n=1478)
[download]

This shows that default line-by-line is the fastest (434/s), enlarged buffer line-by-line is the 2nd fastest (386/s), and chunk and split is the slowest (310/s).

Now append Chatter.bat to "file" until we have a 1GB file. Now we have buffered@15/s, line-by-line@13/s, chuck@9/s.

Find 85MB file: buffered@0.20/s, line-by-line@0.19/s, chunk@0.12/s.

I'd personally consider perl broken if it couldn't read a line at a time faster than I could in Perl code. Previous benchmarks have shown that Perl's overriding of stdio buffers can make perl's I/O faster than I/O in C programs using stdio. So I must be missing something about (at least) your copy of perl to understand why standard line-by-line isn't faster.

Update: I removed a pointless sentence that was probably snide. I apologize to those who already read it.

- tye (but my friends call me "Tye")

Comment on RE: RE: RE: RE (tilly) 2 (blame): File reading efficiency and other surly remarks Download Code

Replies are listed 'Best First'.
RE (tilly) 6 (bench): File reading efficiency and other surly remarks by tilly (Archbishop) on Aug 26, 2000 at 21:35 UTC
As I told lhoward, the result will be highly dependent upon many things. What OS you are on, what compiler you used, whether you compiled with Perl's I/O or your native one, so on and so forth. (ObRandomNote: Ilya used to moan about the fact that Perl was "pessimized" for I/O on Linux. OTOH Perl is still faster at virtually everything else...) I don't doubt for a second that he did that benchmark and got those numbers. I also don't doubt for a second that you did your benchmark and got your numbers as well. The lesson is that this kind of optimization can only be evaluated if you test against your actual target environment. But the advantages in maintainability simply cannot be disputed. In addition to the bugs I already pointed out, what happens if someone changes $/ and tries to figure out why nothing happened?	[reply]
RE: RE: RE: RE: RE (tilly) 2 (blame): File reading efficiency and other surly remarks by lhoward (Vicar) on Aug 26, 2000 at 21:20 UTC
Those are very interesting results. I have tested my code on several diffrent OSes (Solaris and Linux) with several diffrent versions of perl (5.6,5.005, etc..) and the chunk method has always proven faster in my tests. What OS and version of perl did you test with?	[reply]
RE6 (tye): File reading efficiency and other surly remarks by tye (Sage) on Aug 26, 2000 at 21:30 UTC
Win98, Perl5.6.0 plus "ActiveState Build 615". Do you have any theories about why the buffering in perl's internals isn't implented as efficiently as your Perl code, especially considering the overhead involved in executing perl opcodes? - tye (but my friends call me "Tye")	[reply]
RE: RE6 (tye): File reading efficiency and other surly remarks by lhoward (Vicar) on Aug 26, 2000 at 23:33 UTC
It probably has to do with how ActiveState/WinPerl implements its IO layer vs. how traditional Perl on Unix implementes its IP layer. By the looks of things; it looks like your technique is faster under Windows/ActiveState while mine is faster under UNIX. Just another cross-platform gotcha to be aware of.	[reply]