Derek2 has asked for the wisdom of the Perl Monks concerning the following question:

On Dalnet #perl we debated the efficiency of a regex like m/^(foo)?stuff/ against m/(foo|)stuff/.
(Where are you Japhy!) I decided to test it a few
million times using UNIX time and a short script:
#!/usr/bin/perl $string = "foofoo catbar"; for ($i = 0; $i<5000000; $i++) { if ($string =~ m/(foob|)foofoo/) { } # if ($string =~ m/(foob)?foofoo/) { } }
Running this with (foob|) my results are:
 12.180u 0.010s 0:12.19 100.0%
Running this with (foob)? my results are:
 13.090u 0.010s 0:13.09 100.0%
Can someone explain technically why (foob|) gives better
performance than using the (foob)? method? Is it just
my example causing this (as both involve backtracing),
or is there an obvious explanation from within the sources? Looking forward to it :)

edited: Sat Sep 28 02:28:25 2002 by jeffa - code tags, formatting

Replies are listed 'Best First'.
Re: Difference between (foo|) and (foo)?
by tadman (Prior) on Sep 28, 2002 at 05:31 UTC
    Okay, hold the phone. If you're going to compare things in a scientific manner, at least give the question mark variation a fighting chance:
    #!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; my @string = ( "foofoo catbar", "foofoofoo catbar", "foo foo cat bar", "foo flew over the", "cufoofoo nest", ); cmpthese(50_000, { foo_or => sub { /^(foob|)foofoo/ foreach (@string) }, foo_qs => sub { /^(foob)?foofoo/ foreach (@string) } });
    With that in mind, the results are different:
    Benchmark: timing 50000 iterations of foo_or, foo_qs... foo_or: 2 wallclock secs ( 1.28 usr + 0.00 sys = 1.28 CPU) @ 39 +062.50/s (n=50000) foo_qs: 2 wallclock secs ( 1.21 usr + 0.00 sys = 1.21 CPU) @ 41 +322.31/s (n=50000) Rate foo_or foo_qs foo_or 39062/s -- -5% foo_qs 41322/s 6% --
    The variance changes, but the question mark wins every time.
      I added foo_ro => sub { /^(|foob)foofoo/ foreach (@string) } (and upped the iterations) and got
      Benchmark: timing 500000 iterations of foo_or, foo_qs, foo_ro... foo_or: 3 wallclock secs ( 2.55 usr + 0.00 sys = 2.55 CPU) @ 19 +6386.49/s (n=500000) foo_qs: 3 wallclock secs ( 2.45 usr + 0.00 sys = 2.45 CPU) @ 20 +3832.04/s (n=500000) foo_ro: 2 wallclock secs ( 2.39 usr + 0.00 sys = 2.39 CPU) @ 20 +9205.02/s (n=500000) Rate foo_or foo_qs foo_ro foo_or 196386/s -- -4% -6% foo_qs 203832/s 4% -- -3% foo_ro 209205/s 7% 3% --
      This shows the ro doing better than even the qs. I suppose that (foob|) vs (|foob) depends on the data(?).

      --traveler

        (|foob) is the same as (foob)??; (foob|) would be (foob)?, so comparing them isn't really fair.


        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      <html> All you've done in this test is change the weighting of the string to match against in favor of the (foo)? method, in contrast to my original string "foofoo catbar" which is quite possibly weighted in favor of (foo|) better performance. That said, I like the thought behind your approach.

      I altered the test to benchmark each string seperately:

      
      #!/usr/bin/perl -w
      
      use strict;
      
      use Benchmark 'cmpthese';
      
      my @string = (
              "foofoo catbar",
              "foofoofoo catbar",
              "foo foo cat bar",
              "foo flew over the",
              "cufoofoo nest",
      );
      
      cmpthese(1500000, {
              foo_or_0 => sub { $string[0] =~ /^(foob|)foofoo/ },
              foo_or_1 => sub { $string1 =~ /^(foob|)foofoo/ },
              foo_or_2 => sub { $string2 =~ /^(foob|)foofoo/ },
              foo_or_3 => sub { $string3 =~ /^(foob|)foofoo/ },
              foo_or_4 => sub { $string4 =~ /^(foob|)foofoo/ },
      
              foo_qs_0 => sub { $string[0] =~ /^(foob)?foofoo/ },
              foo_qs_1 => sub { $string1 =~ /^(foob)?foofoo/ },
              foo_qs_2 => sub { $string2 =~ /^(foob)?foofoo/ },
              foo_qs_3 => sub { $string3 =~ /^(foob)?foofoo/ },
              foo_qs_4 => sub { $string4 =~ /^(foob)?foofoo/ },
      });
      
      If we benchmark the strings seperately, we get the following:

      
      ddouville@linuxdld:~> ./test.pl
      Benchmark: timing 1500000 iterations of foo_or_0, foo_or_1, foo_or_2, foo_or_3, foo_or_4, foo_qs_0, foo_qs_1, foo_qs_2, foo_qs_3, foo_qs_4...
        foo_or_0:  2 wallclock secs ( 2.21 usr +  0.00 sys =  2.21 CPU) @ 678733.03/s (n=1500000)
        foo_or_1:  3 wallclock secs ( 2.13 usr +  0.00 sys =  2.13 CPU) @ 704225.35/s (n=1500000)
        foo_or_2:  0 wallclock secs ( 0.72 usr +  0.00 sys =  0.72 CPU) @ 2083333.33/s (n=1500000)
        foo_or_3: -1 wallclock secs ( 0.50 usr + -0.01 sys =  0.49 CPU) @ 3061224.49/s (n=1500000)
        foo_or_4:  1 wallclock secs ( 1.26 usr +  0.00 sys =  1.26 CPU) @ 1190476.19/s (n=1500000)
        foo_qs_0:  1 wallclock secs ( 2.07 usr +  0.00 sys =  2.07 CPU) @ 724637.68/s (n=1500000)
        foo_qs_1:  2 wallclock secs ( 2.02 usr +  0.00 sys =  2.02 CPU) @ 742574.26/s (n=1500000)
        foo_qs_2:  0 wallclock secs ( 0.66 usr +  0.00 sys =  0.66 CPU) @ 2272727.27/s (n=1500000)
        foo_qs_3:  2 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 3061224.49/s (n=1500000)
        foo_qs_4:  2 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 1442307.69/s (n=1500000)
                    Rate foo_or_0 foo_or_1 foo_qs_0 foo_qs_1 foo_or_4 foo_qs_4 foo_or_2 foo_qs_2 foo_qs_3 foo_or_3
      foo_or_0  678733/s       --      -4%      -6%      -9%     -43%     -53%     -67%     -70%     -78%     -78%
      foo_or_1  704225/s       4%       --      -3%      -5%     -41%     -51%     -66%     -69%     -77%     -77%
      foo_qs_0  724638/s       7%       3%       --      -2%     -39%     -50%     -65%     -68%     -76%     -76%
      foo_qs_1  742574/s       9%       5%       2%       --     -38%     -49%     -64%     -67%     -76%     -76%
      foo_or_4 1190476/s      75%      69%      64%      60%       --     -17%     -43%     -48%     -61%     -61%
      foo_qs_4 1442308/s     113%     105%      99%      94%      21%       --     -31%     -37%     -53%     -53%
      foo_or_2 2083333/s     207%     196%     188%     181%      75%      44%       --      -8%     -32%     -32%
      foo_qs_2 2272727/s     235%     223%     214%     206%      91%      58%       9%       --     -26%     -26%
      foo_qs_3 3061224/s     351%     335%     322%     312%     157%     112%      47%      35%       --      -0%
      foo_or_3 3061224/s     351%     335%     322%     312%     157%     112%      47%      35%       0%       --
      
      

      Interesting results. benchmark reports the opposite for "foofoo catbar" (my original string) than UNIX 'time' command did. Am I reading the results correctly? I compared foo_or_N to foo_qs_N and this is what I have:

      "foofoo catbar":          QS wins by -6%
      "foofoofoo catbar":       QS wins by -5%
      "foo foo cat bar":        QS wins by -8%
      "foo flew over the":      OR and QS tie 0%
      "cufoofoo nest":          QS wins by -17%
      

      That leaves a question: is there a pattern situation where OR can win?

Re: Difference between (foo|) and (foo)?
by BrowserUk (Patriarch) on Sep 28, 2002 at 06:52 UTC

    This intigued me and I thought maybe the 7.5% difference was in the time it took to parse and/or compile the differences in the regexes. So I thought I'd benchmark the test with them pre-compiled. The results are very intriguing. Not only does the difference between the two compiled versions remain pretty much the same, if anything getting slightly bigger. The pre-compiled versions actually run substantially more slowly than their none pre-compiled counterparts? This is most extreme in the case of the (foob|) version running close to 40% faster than its precompiled counterpart.

    I'd like to see the explanation behind them onions? Probably my test methodology at fault, but I can't see it.

    It took that a stage further and applied study to the searched string. This resulted in a speed-up of the slowest (precompiled (foob)?) and the fastest (the non-precompiled (foob|)), but consistantly slowed the other two varients down.

    Intriguing indeed. The test code and results are below

      You went to a bit of trouble to analyse this, so I thought I'd repay you (hey, I work in QA all day -- test methodology is something I should know about!). At the end of this post I show that your test results are valid, but my conclusion is that your method is wrong. In your test, the pre-compiled optimized expression is contained in a variable which means it has to be pushed onto the stack and interpolated. The non-optimized expression was inline. So I changed a few things to make this more consistent and got different results. I put the non-optimized regular expressions into variables to keep the test consistent (so it has to push a variable onto the stack and interpolate it and all that jazz).

      Here's the code:

      
      #!/usr/bin/perl
      no warnings;
      use strict;
      use Benchmark qw(cmpthese);
      
      $::string = "foofoo catbar";
      $::re_foobOrNowt     = qr/(foob|)foofoo/o;
      $::re_foob0or1        = qr/(foob)?foofoo/o;
      $::foobOrNowt = qr/(foob|)foofoo/;
      $::foob0or1   = qr/(foob)?foofoo/;
      
      #study $::string; print 'After studying the searched string'.$/;
      cmpthese( 1000000, {
          foobOrNowt    => 'if ($string =~ $::foobOrNowt) { };',
          foob0or1    => 'if ($string =~ $::foob0or1) { };',
          c_foobOrNowt=> 'if ($string =~ $::re_foobOrNowt) { };',
          c_foob0or1    => 'if ($string =~ $::re_foob0or1  ) { };',
      });
      
      Here's the results:
      
      ddouville@linuxdld:~> ./test2.pl 
      Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt...
      c_foob0or1:  3 wallclock secs ( 1.92 usr +  0.00 sys =  1.92 CPU) @ 520833.33/s (n=1000000)
      c_foobOrNowt:  2 wallclock secs ( 1.73 usr +  0.00 sys =  1.73 CPU) @ 578034.68/s (n=1000000)
        foob0or1:  1 wallclock secs ( 1.96 usr +  0.00 sys =  1.96 CPU) @ 510204.08/s (n=1000000)
      foobOrNowt:  2 wallclock secs ( 1.99 usr +  0.02 sys =  2.01 CPU) @ 497512.44/s (n=1000000)
                       Rate   foobOrNowt     foob0or1   c_foob0or1 c_foobOrNowt
      foobOrNowt   497512/s           --          -2%          -4%         -14%
      foob0or1     510204/s           3%           --          -2%         -12%
      c_foob0or1   520833/s           5%           2%           --         -10%
      c_foobOrNowt 578035/s          16%          13%          11%           --
      
      Code:
      
      #!/usr/bin/perl
      no warnings;
      use strict;
      use Benchmark qw(cmpthese);
      
      $::string = "foofoo catbar";
      $::re_foobOrNowt     = qr/(foob|)foofoo/o;
      $::re_foob0or1        = qr/(foob)?foofoo/o;
      $::foobOrNowt = qr/(foob|)foofoo/;
      $::foob0or1   = qr/(foob)?foofoo/;
      
      #study $::string; print 'After studying the searched string'.$/;
      cmpthese( 1000, {
          foobOrNowt    => 'for (1..10000) { if ($string =~ $::foobOrNowt) { };}',
          foob0or1    => 'for (1..10000) { if ($string =~ $::foob0or1) { };}',
          c_foobOrNowt=> 'for (1..10000) { if ($string =~ $::re_foobOrNowt) { };}',
          c_foob0or1    => 'for (1..10000) { if ($string =~ $::re_foob0or1  ) { };}',
      });
      
      Results
      
      ddouville@linuxdld:~> ./test2.pl
      Benchmark: timing 1000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt...
      c_foob0or1: 16 wallclock secs (15.97 usr +  0.00 sys = 15.97 CPU) @ 62.62/s (n=1000)
      c_foobOrNowt: 16 wallclock secs (16.25 usr +  0.00 sys = 16.25 CPU) @ 61.54/s (n=1000)
        foob0or1: 17 wallclock secs (16.34 usr +  0.00 sys = 16.34 CPU) @ 61.20/s (n=1000)
      foobOrNowt: 17 wallclock secs (17.32 usr +  0.00 sys = 17.32 CPU) @ 57.74/s (n=1000)
                     Rate   foobOrNowt     foob0or1 c_foobOrNowt   c_foob0or1
      foobOrNowt   57.7/s           --          -6%          -6%          -8%
      foob0or1     61.2/s           6%           --          -1%          -2%
      c_foobOrNowt 61.5/s           7%           1%           --          -2%
      c_foob0or1   62.6/s           8%           2%           2%           --
      
      My tests show a performance increase in the compiled versions.

      Finally, here's your initial test (unaltered), run on my own machine for base-line comparison: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 2 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 1 wallclock secs ( 1.78 usr + 0.00 sys = 1.78 CPU) @ 561797.75/s (n=1000000) foob0or1: 0 wallclock secs ( 1.33 usr + 0.00 sys = 1.33 CPU) @ 751879.70/s (n=1000000) foobOrNowt: 1 wallclock secs ( 1.34 usr + 0.00 sys = 1.34 CPU) @ 746268.66/s (n=1000000) Rate c_foob0or1 c_foobOrNowt foobOrNowt foob0or1 c_foob0or1 520833/s -- -7% -30% -31% c_foobOrNowt 561798/s 8% -- -25% -25% foobOrNowt 746269/s 43% 33% -- -1% foob0or1 751880/s 44% 34% 1% -- These test results agree with your test results, supporting that your results are correct for the test you performed.

        Sorry,  forgot to format that.
        
        Finally,  here's your initial test (unaltered),   run on my own machine for base-line comparison:
        
        ddouville@linuxdld:~> ./test2.pl
        Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt...
        c_foob0or1:  2 wallclock secs ( 1.92 usr +  0.00 sys =  1.92 CPU) @ 520833.33/s (n=1000000)
        c_foobOrNowt:  1 wallclock secs ( 1.78 usr +  0.00 sys =  1.78 CPU) @ 561797.75/s (n=1000000)
          foob0or1:  0 wallclock secs ( 1.33 usr +  0.00 sys =  1.33 CPU) @ 751879.70/s (n=1000000)
        foobOrNowt:  1 wallclock secs ( 1.34 usr +  0.00 sys =  1.34 CPU) @ 746268.66/s (n=1000000)
                         Rate   c_foob0or1 c_foobOrNowt   foobOrNowt     foob0or1
        c_foob0or1   520833/s           --          -7%         -30%         -31%
        c_foobOrNowt 561798/s           8%           --         -25%         -25%
        foobOrNowt   746269/s          43%          33%           --          -1%
        foob0or1     751880/s          44%          34%           1%           --
        
        These test results agree with your test results,  supporting that your results are correct for the test you performed.