in reply to is split optimized?

I don't know if split is or isn't optimized; I just have a story related to split to share.

In our code library, we have a function named rand_split, which looks like this:

	sub rand_split {
	    my ($sep, $string) = @_;
	    my ($element, $char, $pos, $end, @array);

	    $end = length($string);
	    $element = "";
            for ($pos = 0; $pos < $end; $pos++) {
	        $char = substr($string, $pos, 1);
	        if ($char eq $sep) {
	            push (@array, $element);
	            $element = "";
	        } else {
	            $element = $element . $char;
	        }
	    }
	    push (@array, $element);
	    return (@array);
	}

rand_split was written by a guy named Rand a couple of programmer generations ago. We don't know why he reimplimented it; We don't know a lot of things about it. However, another programmer from around that generation claimed that "If he did it, there must have been a reason." It's not like he didn't know split didn't exist, meaning he purposefully reinvented a wheel. As nearly as we can tell, it walks like split and talks like split; Therefore, it is split. It's used in one script these days, and probably zero in the very near future, but I think we've kept it around mainly for gag value, giving every new generation of programmer something to wonder about.

Replies are listed 'Best First'.
RE: Re: is split optimized?
by athomason (Curate) on Jul 14, 2000 at 10:17 UTC
    Well, rand_split doesn't walk and talk like split. Split takes a /PATTERN/ as the first argument, not a single character (and it can take a third argument, etc, etc). I can see a possible motivation for the redundant implementation, then: "Rand" might have been thinking that his version could be faster than split since he doesn't have to worry about regex matches. He probably should have checked this assumption out, though, since it's blatantly wrong. I threw together this script (without warnings or strict! horror!) to check the effiency of rand_split:
    use Benchmark; @chars = ('x', ' '); $string = ""; $string .= $chars[rand 2] for (1..1000); timethese( 5000, { 'split' => 'split / /, $string', 'rand_split' => 'rand_split(" ", $string)' }); sub rand_split { my ($sep, $string) = @_; my ($element, $char, $pos, $end, @array); $end = length($string); $element = ""; for ($pos = 0; $pos < $end; $pos++) { $char = substr($string, $pos, 1); if ($char eq $sep) { push (@array, $element); $element = ""; } else { $element = $element . $char; } } push (@array, $element); return (@array); }

    These were the disheartening results:

    rand_split: 43 wallclock secs (40.37 usr + 0.01 sys = 40.38 CPU) @ 12 +3.83/s (n=5000) split: 4 wallclock secs ( 3.73 usr + 0.00 sys = 3.73 CPU) @ 13 +42.28/s (n=5000)
    Clearly, split is quite capable of optimizing static patterns and doing it much faster than Perl code (since split is, of course, implemented in C). Gag value is about all you'll get out of this routine ;-).

      Hmm... As you've clearly demonstrated the code runs like a three legged greyhound. I would be interested to know what version of Perl Mr Rand was using when he wrote the function and do the benchmark with it. You may find the answer is the same. Then again you may find that the split function has been optimised since rand_split was written.

      Nuance