in reply to Re: split and sysread()
in thread split and sysread()

with the main benefit seemingly coming from bypassing stdio

I just want to clarify what that means because to some people this may sound as if Perl IO is slow and it's always better to use sysread(). In the examples that BrowserUK++ provided, the main benefit comes from the fact that in case of sysread(), the code is looking at the data that is being read only once (plus a little overhead for looking for that last "\n"). In case of normal Perl IO, i.e, <FH>, every character is looked at twice -- first by Perl to figure out where each line ends, then by the code itself to split everything into separate fields. That's why you're seeing a ~50% increase in performance. You can also confirm this by checking the user and system times for normal Perl IO and sysread(). You'll see that system time is pretty much the same in both cases but user time will vary.

--perlplexer

Replies are listed 'Best First'.
Re: Re: Re: split and sysread()
by dws (Chancellor) on Apr 20, 2003 at 16:11 UTC
    In case of normal Perl IO, i.e, <FH>, every character is looked at twice -- first by Perl to figure out where each line ends, then by the code itself to split everything into separate fields. That's why you're seeing a ~50% increase in performance.

    I disagree. The "looking at each character" is relatively cheap. Making a new string out of each line, however, is more expensive. The sysread() approach places less load on Perl's memory management.

      CPU cycles necessary for memory allocation certainly count but they constitute a fairly small percentage of CPU cycles necessary for comparing each and every character in a 75MB file -- roughly 78 million comparisons. That is considerable even if you do them directly in assembly language. I obviously don't know but I'm pretty certain that in Perl every comparison involves at least one function call and those are much more expensive that simple cmp's.

      Another way to show that it's not just the memory allocation that matters is to plot a graph showing execution speed vs the size of the read buffer. You'll notice that after a certain point the graph will flatten out and you won't see much improvement no matter how much memory you allocate.

      --perlplexer