in reply to Re: 30 Spaces- 1 question
in thread 30 Spaces- 1 question

Hmm. Sorry to be the parade-rainer, but the benchmark isn't actually measuring what it should be. @x is a lexical here, and when Benchmark takes those strings and evaluates them, @x is out of scope. That's why they're all going so fast; there's nothing to loop over.

Also, and more minorly, your for_index_substr routine isn't working -- $_ is being re-aliased on the second for loop and the actual target string is being lost.

I took the liberty of making a couple of changes and re-running:

use strict; use Benchmark qw(cmpthese); my @x; my @result; $x[0]= "mark " x 35; $x[1]= "asdfasdfasdfasdfasdf " x 35; $x[2]= "as " x 31; $x[3]= "asdfasdfasdfasdfasdfasdf " x 100; $x[4]= "asdfasdfasdfasdfasdfasdf " x 10; cmpthese (-5, { 'regexp' => sub { my $i; foreach (@x) { /^((?:\S+\s*){1,30})/; $result[$i++]{REx} = $1; } }, 'split_join' => sub { my $i; foreach (@x) { $result[$i++]{split} = join " ", (split " ", $_,31)[0..29]; } }, 'for_index_substr' => sub { my $i; foreach (@x) { my $ind = index ($_, " "); for my $foo (0..28) { last if $ind == -1; $ind = index $_, " ", $ind + 1; } $result[$i++]{index} = substr($_,0,$ind); } }, } ); for (@result) { print "bad!" unless ($_{REx} eq $_{split} and $_{split} eq $_{index}); }

This produces:

                   Rate       split_join for_index_substr           regexp
split_join       4294/s               --              -7%             -27%
for_index_substr 4618/s               8%               --             -21%
regexp           5849/s              36%              27%               --

-dlc

Replies are listed 'Best First'.
RE: (dchetlin: Benchmark fixes) 30 Spaces- 1 question
by extremely (Priest) on Oct 10, 2000 at 08:29 UTC

    *sigh* That will teach me, eh? I should have run it with it not printing to STDERR and that piped to /dev/null =)

    Also, in my defense, it was after my bedtime =) Don't apologize for being sharp...

    What is worse it that I tested a working version of the for_index_substr but "cleaned it up" for the final run.

    I normally use vars ... to avoid the scope issue. =( *woe is me*

    OTOH, I don't get your numbers when running your code. Number one, your results check should be:

    foreach (@result) { print "bad!" unless ($_->{REx} eq $_->{split} and $_->{split} eq $_->{index}); }

    Those arrows are important. Second, the comparison will always fail because your regex includes the terminal space and the split version doesn't. Also, if there were multiple spaces or other types of whitespace they would fail but we both agreed to ignore that. =)

    Third, I still get equivalent results, (split_join_2 has the non-magical /\s+/ regex, just for fun. I get this with cmpthese 20,000. (using cmpthese -10 gives the equivalent results.)

                       Rate split_join_2   split_join        regexp for_index_substr
    split_join_2     3205/s           --          -0%           -4%              -6%
    split_join       3205/s           0%           --           -4%              -6%
    regexp           3344/s           4%           4%            --              -2%
    for_index_substr 3413/s           6%           6%            2%               --

    Worse, the array slice hack on the end of (split...)[0..29] throws warnings on -w if there are too few items in the slice to join. Join doesn't like undef it appears =)

    All in all, I'm now less enlightened now. Did I cut-n-paste wrong? I print debugged to verify that bits were working all thru it. *sigh*

    --
    $you = new YOU;
    honk() if $you->love(perl)

      Hmm, that was dumb of me. But as you point out, as long as we know it's working, the comparison is unnecessary anyways.

      I'm much more intrigued that you're getting different results than I am. What version of Perl are you running?

      [~] $ perl -v This is perl, v5.6.0 built for i686-linux (with 1 registered patch, see perl -V for more detail) Copyright 1987-2000, Larry Wall Binary build 618 provided by ActiveState Tool Corp. http://www.ActiveS +tate.com Built 04:53:25 Sep 14 2000

      Update: Hmm. I can't get any consistency out of this benchmark. Out of the various machines I tried it on, the REx solution is about half of the time way ahead of the pack, and the other half they're all about the same. Weird. Eerie.

      -dlc

        This is perl, v5.6.0 built for i686-linux

        Built this version myself on a Linux 2.2.17 box.

        UPDATED:
        OK I have a new 5.6.0 build at home, on the old pentium 166 This is perl, v5.6.0 built for i586-linux That produces these results with your script, no changes, 'cept I whacked the check off the end.

                          Rate for_index_substr           regexp       split_join
        for_index_substr 395/s               --             -24%             -24%
        regexp           518/s              31%               --              -0%
        split_join       520/s              32%               0%               --
        

        Bet on us looking at different optimizations for processor and platform and compiler.

        IMHO, the regex or the index should usually win. The split_join one creates a whole array and then reloops over it to create a new string. the regex just matches along it and then internally $1 is a substr created by the match, you just link a real name to the SV* record. Index should beat a regex and the overhead of creating an inner loop manually and following it shouldn't be that bad, and substr is real fast. =) Your test surprised me cause regex won by so much. This newest one makes no sense.

        If all the results turn out to be valid, what are the platform differences, really?

        Another box, 5.004_04 i386-linux (oh man does this need an overhaul?) gives me 13sec@regex, 16sec@split, 18.5sec@index. I'm just making this worse, aren't I?

        --
        $you = new YOU;
        honk() if $you->love(perl)