in reply to Re: is split optimized?
in thread is split optimized?

mea culpa, fellow monk, but I must disagree with your answer. Yes, the benchmarks say something but it is what they say I feel needs deeper interpretation.

As Russ said, split is incredibly well optimized. Most of the perl internals are. There have been many C coders of wonderous talent pouring over the code to make it so. You code demonstrates that the AM was not using the correct tool, which is an answer to an unasked question.

Your four pieces of code are doing radically different things. The regex is stopping after the first match, while the split must work the entire string. Until you compare apples to apples, no conclusion can be drawn. Let us run this test and do it correctly. Note the slight changes I made to the regex code. That should result in a better comparison.

#!/usr/local/bin/perl -w use strict; use Benchmark; my $testlarge = "a " x 100000; my $testsmall = "a b c d e f"; timethese(-10,{ One => sub { my ($y) = (split(/\s+/,$testlarge))[0]; }, Two => sub { my ($y) = (split(/\s+/,$testsmall))[0]; }, Three => sub { my $y = ( $testsmall =~ (/([^\s]*)\s+/g))[0]; }, Four => sub { my $y = ( $testlarge =~ (/([^\s]*)\s+/g))[0]; }, }); mik@mach5:/home/mik/monk)./benchthis.pl Benchmark: running Four, One, Three, Two, each for at least 10 CPU sec +onds... Four: 19 wallclock secs (18.38 usr + 0.02 sys = 18.40 CPU) @ 1 +.14/s (n=21) One: 13 wallclock secs (12.72 usr + 0.00 sys = 12.72 CPU) @ 2 +.12/s (n=27) Three: 12 wallclock secs (10.28 usr + 0.00 sys = 10.28 CPU) @ 12 +748.55/s (n=131071) Two: 11 wallclock secs (10.00 usr + 0.00 sys = 10.00 CPU) @ 18 +600.60/s (n=186006)
When comparing apples to apples, it seems split is highly optimized. This more an issue of choosing the right tool for the job at hand.

This rant brought to you by
mikfire

Replies are listed 'Best First'.
Apples, Oranges, and Fruit
by gryng (Hermit) on Jul 14, 2000 at 06:59 UTC
    I agree that I was not comparing the equivalent (in work requested to be done) code. However my point was to submit code that (plainly) only got the first arguement, in order to show how much more work split was doing. As pointed out later, by btrott and nardo, adding split's third arguement brings it back up to regex's speed, but that is because they are now doing an equivalent amount of work.

    I answered Anonymous Monks's question of wether "perl optimize(s) the split and only grab the first field" with the line: ($y) = (split(/\s+/,$x))[0]; To which the answer is no. But as I conceeded to btrott, he (and nardo) had the "correct" answer, of saying that you need to add a third arguement to get split to only do the first match.

    Appreciately, you probably responded because I used regex's in my example and you did not want Anonymous Monk to mistakenly think that regex's were faster than split. However I do not think he took it that way, rather his post seemed to convey an understanding of all of what I just mentioned above (minus the fact that we did not know/remember about the third arguement in split).

    Anyway, this is mainly here to clear up the comment in the chatterbox about benchmark being a hammer. I was only using it to show, fairly concretely, that split was not stopping after the first match. I didn't mean to imply that it couldn't.

    Cheers,
    Gryn