in reply to Difference between (foo|) and (foo)?

This intigued me and I thought maybe the 7.5% difference was in the time it took to parse and/or compile the differences in the regexes. So I thought I'd benchmark the test with them pre-compiled. The results are very intriguing. Not only does the difference between the two compiled versions remain pretty much the same, if anything getting slightly bigger. The pre-compiled versions actually run substantially more slowly than their none pre-compiled counterparts? This is most extreme in the case of the (foob|) version running close to 40% faster than its precompiled counterpart.

I'd like to see the explanation behind them onions? Probably my test methodology at fault, but I can't see it.

It took that a stage further and applied study to the searched string. This resulted in a speed-up of the slowest (precompiled (foob)?) and the fastest (the non-precompiled (foob|)), but consistantly slowed the other two varients down.

Intriguing indeed. The test code and results are below

#!/usr/bin/perl no warnings; use strict; use Benchmark qw(cmpthese); $::string = "foofoo catbar"; $::re_foobOrNowt = qr/(foob|)foofoo/o; $::re_foob0or1 = qr/(foob)?foofoo/o; #study $::string; print 'After studying the searched string'.$/; cmpthese( 1000000, { foobOrNowt => 'if ($string =~ m/(foob|)foofoo/) { };', foob0or1 => 'if ($string =~ m/(foob)?foofoo/) { };', c_foobOrNowt=> 'if ($string =~ $::re_foobOrNowt) { };', c_foob0or1 => 'if ($string =~ $::re_foob0or1 ) { };', }); __DATA__ C:\test>201403 Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob +0or1, foobOrNowt... c_foob0or1: 13 wallclock secs (13.38 usr + 0.00 sys = 13.38 CPU) @ 74 +744.00/s (n=1000000) c_foobOrNowt: 12 wallclock secs (11.85 usr + 0.00 sys = 11.85 CPU) @ +84409.56/s (n=1000000) foob0or1: 10 wallclock secs (10.63 usr + 0.00 sys = 10.63 CPU) @ 94 +117.65/s (n=1000000) foobOrNowt: 8 wallclock secs ( 8.60 usr + 0.00 sys = 8.60 CPU) @ 11 +6238.52/s (n=1000000) Rate c_foob0or1 c_foobOrNowt foob0or1 foobOrN +owt c_foob0or1 74744/s -- -11% -21% - +36% c_foobOrNowt 84410/s 13% -- -10% - +27% foob0or1 94118/s 26% 12% -- - +19% foobOrNowt 116239/s 56% 38% 24% + -- C:\test>201403 After studying the searched string Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob +0or1, foobOrNowt... c_foob0or1: 12 wallclock secs (12.57 usr + 0.00 sys = 12.57 CPU) @ 79 +567.15/s (n=1000000) c_foobOrNowt: 12 wallclock secs (11.67 usr + 0.00 sys = 11.67 CPU) @ +85711.84/s (n=1000000) foob0or1: 11 wallclock secs (10.65 usr + 0.00 sys = 10.65 CPU) @ 93 +940.82/s (n=1000000) foobOrNowt: 10 wallclock secs ( 8.42 usr + 0.00 sys = 8.42 CPU) @ 11 +8736.64/s (n=1000000) Rate c_foob0or1 c_foobOrNowt foob0or1 foobOrN +owt c_foob0or1 79567/s -- -7% -15% - +33% c_foobOrNowt 85712/s 8% -- -9% - +28% foob0or1 93941/s 18% 10% -- - +21% foobOrNowt 118737/s 49% 39% 26% + -- C:\test>

Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!

Replies are listed 'Best First'.
Re: Re: Difference between (foo|) and (foo)?
by Anonymous Monk on Sep 30, 2002 at 19:53 UTC
    You went to a bit of trouble to analyse this, so I thought I'd repay you (hey, I work in QA all day -- test methodology is something I should know about!). At the end of this post I show that your test results are valid, but my conclusion is that your method is wrong. In your test, the pre-compiled optimized expression is contained in a variable which means it has to be pushed onto the stack and interpolated. The non-optimized expression was inline. So I changed a few things to make this more consistent and got different results. I put the non-optimized regular expressions into variables to keep the test consistent (so it has to push a variable onto the stack and interpolate it and all that jazz).

    Here's the code:

    
    #!/usr/bin/perl
    no warnings;
    use strict;
    use Benchmark qw(cmpthese);
    
    $::string = "foofoo catbar";
    $::re_foobOrNowt     = qr/(foob|)foofoo/o;
    $::re_foob0or1        = qr/(foob)?foofoo/o;
    $::foobOrNowt = qr/(foob|)foofoo/;
    $::foob0or1   = qr/(foob)?foofoo/;
    
    #study $::string; print 'After studying the searched string'.$/;
    cmpthese( 1000000, {
        foobOrNowt    => 'if ($string =~ $::foobOrNowt) { };',
        foob0or1    => 'if ($string =~ $::foob0or1) { };',
        c_foobOrNowt=> 'if ($string =~ $::re_foobOrNowt) { };',
        c_foob0or1    => 'if ($string =~ $::re_foob0or1  ) { };',
    });
    
    Here's the results:
    
    ddouville@linuxdld:~> ./test2.pl 
    Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt...
    c_foob0or1:  3 wallclock secs ( 1.92 usr +  0.00 sys =  1.92 CPU) @ 520833.33/s (n=1000000)
    c_foobOrNowt:  2 wallclock secs ( 1.73 usr +  0.00 sys =  1.73 CPU) @ 578034.68/s (n=1000000)
      foob0or1:  1 wallclock secs ( 1.96 usr +  0.00 sys =  1.96 CPU) @ 510204.08/s (n=1000000)
    foobOrNowt:  2 wallclock secs ( 1.99 usr +  0.02 sys =  2.01 CPU) @ 497512.44/s (n=1000000)
                     Rate   foobOrNowt     foob0or1   c_foob0or1 c_foobOrNowt
    foobOrNowt   497512/s           --          -2%          -4%         -14%
    foob0or1     510204/s           3%           --          -2%         -12%
    c_foob0or1   520833/s           5%           2%           --         -10%
    c_foobOrNowt 578035/s          16%          13%          11%           --
    
    Code:
    
    #!/usr/bin/perl
    no warnings;
    use strict;
    use Benchmark qw(cmpthese);
    
    $::string = "foofoo catbar";
    $::re_foobOrNowt     = qr/(foob|)foofoo/o;
    $::re_foob0or1        = qr/(foob)?foofoo/o;
    $::foobOrNowt = qr/(foob|)foofoo/;
    $::foob0or1   = qr/(foob)?foofoo/;
    
    #study $::string; print 'After studying the searched string'.$/;
    cmpthese( 1000, {
        foobOrNowt    => 'for (1..10000) { if ($string =~ $::foobOrNowt) { };}',
        foob0or1    => 'for (1..10000) { if ($string =~ $::foob0or1) { };}',
        c_foobOrNowt=> 'for (1..10000) { if ($string =~ $::re_foobOrNowt) { };}',
        c_foob0or1    => 'for (1..10000) { if ($string =~ $::re_foob0or1  ) { };}',
    });
    
    Results
    
    ddouville@linuxdld:~> ./test2.pl
    Benchmark: timing 1000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt...
    c_foob0or1: 16 wallclock secs (15.97 usr +  0.00 sys = 15.97 CPU) @ 62.62/s (n=1000)
    c_foobOrNowt: 16 wallclock secs (16.25 usr +  0.00 sys = 16.25 CPU) @ 61.54/s (n=1000)
      foob0or1: 17 wallclock secs (16.34 usr +  0.00 sys = 16.34 CPU) @ 61.20/s (n=1000)
    foobOrNowt: 17 wallclock secs (17.32 usr +  0.00 sys = 17.32 CPU) @ 57.74/s (n=1000)
                   Rate   foobOrNowt     foob0or1 c_foobOrNowt   c_foob0or1
    foobOrNowt   57.7/s           --          -6%          -6%          -8%
    foob0or1     61.2/s           6%           --          -1%          -2%
    c_foobOrNowt 61.5/s           7%           1%           --          -2%
    c_foob0or1   62.6/s           8%           2%           2%           --
    
    My tests show a performance increase in the compiled versions.

    Finally, here's your initial test (unaltered), run on my own machine for base-line comparison: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 2 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 1 wallclock secs ( 1.78 usr + 0.00 sys = 1.78 CPU) @ 561797.75/s (n=1000000) foob0or1: 0 wallclock secs ( 1.33 usr + 0.00 sys = 1.33 CPU) @ 751879.70/s (n=1000000) foobOrNowt: 1 wallclock secs ( 1.34 usr + 0.00 sys = 1.34 CPU) @ 746268.66/s (n=1000000) Rate c_foob0or1 c_foobOrNowt foobOrNowt foob0or1 c_foob0or1 520833/s -- -7% -30% -31% c_foobOrNowt 561798/s 8% -- -25% -25% foobOrNowt 746269/s 43% 33% -- -1% foob0or1 751880/s 44% 34% 1% -- These test results agree with your test results, supporting that your results are correct for the test you performed.

      Sorry,  forgot to format that.
      
      Finally,  here's your initial test (unaltered),   run on my own machine for base-line comparison:
      
      ddouville@linuxdld:~> ./test2.pl
      Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt...
      c_foob0or1:  2 wallclock secs ( 1.92 usr +  0.00 sys =  1.92 CPU) @ 520833.33/s (n=1000000)
      c_foobOrNowt:  1 wallclock secs ( 1.78 usr +  0.00 sys =  1.78 CPU) @ 561797.75/s (n=1000000)
        foob0or1:  0 wallclock secs ( 1.33 usr +  0.00 sys =  1.33 CPU) @ 751879.70/s (n=1000000)
      foobOrNowt:  1 wallclock secs ( 1.34 usr +  0.00 sys =  1.34 CPU) @ 746268.66/s (n=1000000)
                       Rate   c_foob0or1 c_foobOrNowt   foobOrNowt     foob0or1
      c_foob0or1   520833/s           --          -7%         -30%         -31%
      c_foobOrNowt 561798/s           8%           --         -25%         -25%
      foobOrNowt   746269/s          43%          33%           --          -1%
      foob0or1     751880/s          44%          34%           1%           --
      
      These test results agree with your test results,  supporting that your results are correct for the test you performed.