Kc12349 has asked for the wisdom of the Perl Monks concerning the following question:

Hello All,

I am a bit confused with the below benchmarks. I have read in multiple places, including a regex tutorial here, to avoid the /g modifier when it is not needed. This made sense to me in that it shouldn't have to continue searching once it has found a match, if the first match is what you're looking for.

The below metrics (in the code comments) don't seem to bare this out. Can anyone alleviate my confusion?

use Time::HiRes qw(sleep time); my $time = time; for (1..1_000_000) { my $str = '123456789'; my ($a,$b) = $str =~ m/(23)[^8]+(8)/g; # about 1.88 sec #my ($a,$b) = $str =~ m/(23)[^8]+(8)/; # about 2.15 sec #my ($a) = $str =~ m/(23)/g; # about 1.23 sec #my ($a) = $str =~ m/(23)/; # about 1.41 sec } say time - $time;

Edit: I'm running Strawberry 5.12.3

Replies are listed 'Best First'.
Re: Why does global match run faster than none global?
by BrowserUk (Patriarch) on Aug 23, 2011 at 20:19 UTC

    Intriguing. I can confirm your findings (5.10.1):

    $str = '123456789'; cmpthese -1, { a=>q[ my ($a,$b) = $str =~ m/(23)[^8]+(8)/g; ], b=>q[ my ($a,$b) = $str =~ m/(23)[^8]+(8)/; ], c=>q[ my ($a) = $str =~ m/(23)/g ], d=>q[ my ($a) = $str =~ m/(23)/; ], };; Rate b a d c b 388799/s -- -14% -37% -52% a 449698/s 16% -- -27% -44% d 612922/s 58% 36% -- -24% c 806251/s 107% 79% 32% --

    I can't even begin to guess why it would be so. 16% and 32% is hardly noise.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Cause you're using an old(ish) version of Perl?

      Rate b a c d b 3778652/s -- -1% -19% -21% a 3817631/s 1% -- -18% -20% c 4677165/s 24% 23% -- -2% d 4766254/s 26% 25% 2% --

      This is perl 5, version 14, subversion 0 (v5.14.0) built for i686-linux-thread-multi

        Even more intriguing. The slowest has hardly changed, but the previously faster ones have slowed markedly.

        C:\test\perl-5.14.0-RC1>perl use Benchmark qw[ cmpthese ];; print $];; $str = '123456789'; cmpthese -1, { a=>q[ my ($a,$b) = $str =~ m/(23)[^8]+(8)/g; ], b=>q[ my ($a,$b) = $str =~ m/(23)[^8]+(8)/; ], c=>q[ my ($a) = $str =~ m/(23)/g ], d=>q[ my ($a) = $str =~ m/(23)/; ], };; ^Z 5.014000 Rate b a d c b 363518/s -- -17% -35% -46% a 435446/s 20% -- -22% -35% d 555991/s 53% 28% -- -17% c 668598/s 84% 54% 20% --

        But still, 20% is not to be sneezed at.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        I'm running strawberry v5.12.3 built for MSWin32-x86-multi-thread. What confuses me, is that by its nature /g seems like it should run longer.

        You did ensure that $str was defined as our not my didn't you?

      I can't replicate your results with the two versions I currently have installed on this machine:

      $ /usr/local/bin/perl5.10.1 921987.pl Rate b a c d b 4497569/s -- -2% -21% -26% a 4591346/s 2% -- -19% -25% c 5681139/s 26% 24% -- -7% d 6116693/s 36% 33% 8% -- $ /usr/local/bin/perl5.10.1 -v This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi $ /usr/local/bin/perl5.12.2 921987.pl Rate a b c d a 4314282/s -- -8% -30% -36% b 4677165/s 8% -- -24% -31% c 6168093/s 43% 32% -- -9% d 6779346/s 57% 45% 10% -- $ /usr/local/bin/perl5.12.2 -v This is perl 5, version 12, subversion 2 (v5.12.2) built for x86_64-li +nux-thread-multi

      If there is any significant difference at all, it tends to be the other way around, i.e. /g is slower.

        No matter how long I run it for, it is remarkably consistent here with no more than 1 or 2% variation:

        C:\test\perl-5.14.0-RC1>perl use Benchmark qw[ cmpthese ];; print $];; $str = '123456789'; cmpthese -10, { a=>q[ my ($a,$b) = $str =~ m/(23)[^8]+(8)/g; ], b=>q[ my ($a,$b) = $str =~ m/(23)[^8]+(8)/; ], c=>q[ my ($a) = $str =~ m/(23)/g ], d=>q[ my ($a) = $str =~ m/(23)/; ], };; ^Z 5.014000 Rate b a d c b 357543/s -- -15% -33% -45% a 422192/s 18% -- -21% -35% d 535621/s 50% 27% -- -18% c 653518/s 83% 55% 22% --

        One difference of note is that I'm using Window rather than your Linux. Your results reflect ikegami's, who I believe was also using Linux. Perhaps the OP is on Windows?

        The 'usual suspect' for performance differences a between those two is memory allocation, but there is none worthy of note here. Indeed, there appear (as you would suspect), to be no calls at all into the OS during benchmark.

        Since were both on 64-bit intel hardware, that doesn't seem likely as a cause. Which pretty much leaves only compiler differences, with teh tentative conclusion that with the /g switch enabled, the Windows takes a code path that causes (or allows) the MSC compiler to generate a particularly efficient piece of code somewhere.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        I think we (or at least I) may be chasing noise. I can't replicate the larger 20% number. Additionally I get noise in both directions on multiple runs. My guess would be over enough iterations we'd see a slightly slower /g.

        use Benchmark qw[ cmpthese ];; my $str = '123456789'; cmpthese -1, { a=>q[ my ($a,$b) = $str =~ m/(23)[^8]+(8)/g; ], b=>q[ my ($a,$b) = $str =~ m/(23)[^8]+(8)/; ], c=>q[ my ($a) = $str =~ m/(23)/g ], d=>q[ my ($a) = $str =~ m/(23)/; ], }; Rate a b c d a 7047422/s -- -2% -26% -29% b 7218432/s 2% -- -25% -28% c 9578119/s 36% 33% -- -4% d 9960542/s 41% 38% 4% -- Rate a b c d a 7143583/s -- 2% -24% -24% b 7005183/s -2% -- -25% -25% c 9378794/s 31% 34% -- -0% d 9387510/s 31% 34% 0% --

        You did ensure that $str was defined as our not my didn't you?

      windows 5.010000 Rate b a d c b 299030/s -- -7% -43% -53% a 320550/s 7% -- -39% -49% d 525288/s 76% 64% -- -17% c 631310/s 111% 97% 20% -- HP-UX 5.008008 Rate b a d c b 225468/s -- -7% -25% -30% a 243327/s 8% -- -19% -25% d 300755/s 33% 24% -- -7% c 322947/s 43% 33% 7% --
Re: Why does global match run faster than none global?
by bart (Canon) on Aug 23, 2011 at 22:39 UTC
    Note that you're running the regexes in list context, so there's also a difference in what it returns with and without /g. That might be a clue to the cause.
Re: Why does global match run faster than none global?
by Anonymous Monk on Aug 24, 2011 at 09:07 UTC
    Running BrowserUk code I get
    perl 5.014001 / mingw-built Rate a b c d a 3016041/s -- -2% -22% -22% b 3078291/s 2% -- -20% -20% c 3854936/s 28% 25% -- -0% d 3855233/s 28% 25% 0% -- perl 5.012002 / mingw-built Rate b a c d b 2760859/s -- -1% -21% -21% a 2785166/s 1% -- -20% -21% c 3492604/s 27% 25% -- -0% d 3507350/s 27% 26% 0% -- perl 5.008009 / activeperl Rate b a d c b 2795403/s -- -2% -22% -22% a 2852178/s 2% -- -20% -21% d 3585286/s 28% 26% -- -0% c 3593871/s 29% 26% 0% -- perl 5.006001 / mingw/msys-built Rate b a d c b 3492343/s -- -1% -22% -23% a 3534227/s 1% -- -21% -22% d 4477362/s 28% 27% -- -1% c 4543103/s 30% 29% 1% --
Re: Why does global match run faster than none global?
by Anonymous Monk on Aug 25, 2011 at 11:39 UTC

    The string '123456789' is very short. Try testing with a very long string:

    our $s = '123456789' x 100_000; cmpthese -1, { a=>q[ my ($a,$b) = $s =~ m/(23)[^8]+(8)/g; ], b=>q[ my ($a,$b) = $s =~ m/(23)[^8]+(8)/; ], c=>q[ my ($a) = $s =~ m/(23)/g ], d=>q[ my ($a) = $s =~ m/(23)/; ], };

    Outputs:

    Rate a c d b a 7.65/s -- -34% -99% -99% c 11.7/s 52% -- -99% -99% d 1336/s 17371% 11358% -- -3% b 1380/s 17950% 11738% 3% --