Re: Difference between (foo|) and (foo)?

This intigued me and I thought maybe the 7.5% difference was in the time it took to parse and/or compile the differences in the regexes. So I thought I'd benchmark the test with them pre-compiled. The results are very intriguing. Not only does the difference between the two compiled versions remain pretty much the same, if anything getting slightly bigger. The pre-compiled versions actually run substantially more slowly than their none pre-compiled counterparts? This is most extreme in the case of the (foob|) version running close to 40% faster than its precompiled counterpart.

I'd like to see the explanation behind them onions? Probably my test methodology at fault, but I can't see it.

It took that a stage further and applied study to the searched string. This resulted in a speed-up of the slowest (precompiled (foob)?) and the fastest (the non-precompiled (foob|)), but consistantly slowed the other two varients down.

Intriguing indeed. The test code and results are below

#!/usr/bin/perl
no warnings;
use strict;
use Benchmark qw(cmpthese);

$::string = "foofoo catbar";
$::re_foobOrNowt     = qr/(foob|)foofoo/o;
$::re_foob0or1        = qr/(foob)?foofoo/o;

#study $::string; print 'After studying the searched string'.$/;
cmpthese( 1000000, {
    foobOrNowt    => 'if ($string =~ m/(foob|)foofoo/) { };',
    foob0or1    => 'if ($string =~ m/(foob)?foofoo/) { };',
    c_foobOrNowt=> 'if ($string =~ $::re_foobOrNowt) { };',
    c_foob0or1    => 'if ($string =~ $::re_foob0or1  ) { };',
});

__DATA__

C:\test>201403
Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob
+0or1, foobOrNowt...
c_foob0or1: 13 wallclock secs (13.38 usr +  0.00 sys = 13.38 CPU) @ 74
+744.00/s (n=1000000)
c_foobOrNowt: 12 wallclock secs (11.85 usr +  0.00 sys = 11.85 CPU) @ 
+84409.56/s (n=1000000)
  foob0or1: 10 wallclock secs (10.63 usr +  0.00 sys = 10.63 CPU) @ 94
+117.65/s (n=1000000)
foobOrNowt:  8 wallclock secs ( 8.60 usr +  0.00 sys =  8.60 CPU) @ 11
+6238.52/s (n=1000000)
                 Rate   c_foob0or1 c_foobOrNowt     foob0or1   foobOrN
+owt
c_foob0or1    74744/s           --         -11%         -21%         -
+36%
c_foobOrNowt  84410/s          13%           --         -10%         -
+27%
foob0or1      94118/s          26%          12%           --         -
+19%
foobOrNowt   116239/s          56%          38%          24%          
+ --

C:\test>201403
After studying the searched string
Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob
+0or1, foobOrNowt...
c_foob0or1: 12 wallclock secs (12.57 usr +  0.00 sys = 12.57 CPU) @ 79
+567.15/s (n=1000000)
c_foobOrNowt: 12 wallclock secs (11.67 usr +  0.00 sys = 11.67 CPU) @ 
+85711.84/s (n=1000000)
  foob0or1: 11 wallclock secs (10.65 usr +  0.00 sys = 10.65 CPU) @ 93
+940.82/s (n=1000000)
foobOrNowt: 10 wallclock secs ( 8.42 usr +  0.00 sys =  8.42 CPU) @ 11
+8736.64/s (n=1000000)
                 Rate   c_foob0or1 c_foobOrNowt     foob0or1   foobOrN
+owt
c_foob0or1    79567/s           --          -7%         -15%         -
+33%
c_foobOrNowt  85712/s           8%           --          -9%         -
+28%
foob0or1      93941/s          18%          10%           --         -
+21%
foobOrNowt   118737/s          49%          39%          26%          
+ --

C:\test>
[download]

Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!

Comment on Re: Difference between (foo\|) and (foo)? Download Code

Replies are listed 'Best First'.
Re: Re: Difference between (foo\|) and (foo)? by Anonymous Monk on Sep 30, 2002 at 19:53 UTC
You went to a bit of trouble to analyse this, so I thought I'd repay you (hey, I work in QA all day -- test methodology is something I should know about!). At the end of this post I show that your test results are valid, but my conclusion is that your method is wrong. In your test, the pre-compiled optimized expression is contained in a variable which means it has to be pushed onto the stack and interpolated. The non-optimized expression was inline. So I changed a few things to make this more consistent and got different results. I put the non-optimized regular expressions into variables to keep the test consistent (so it has to push a variable onto the stack and interpolate it and all that jazz). Here's the code: #!/usr/bin/perl no warnings; use strict; use Benchmark qw(cmpthese); $::string = "foofoo catbar"; $::re_foobOrNowt = qr/(foob\|)foofoo/o; $::re_foob0or1 = qr/(foob)?foofoo/o; $::foobOrNowt = qr/(foob\|)foofoo/; $::foob0or1 = qr/(foob)?foofoo/; #study $::string; print 'After studying the searched string'.$/; cmpthese( 1000000, { foobOrNowt => 'if ($string =~ $::foobOrNowt) { };', foob0or1 => 'if ($string =~ $::foob0or1) { };', c_foobOrNowt=> 'if ($string =~ $::re_foobOrNowt) { };', c_foob0or1 => 'if ($string =~ $::re_foob0or1 ) { };', }); Here's the results: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 3 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 2 wallclock secs ( 1.73 usr + 0.00 sys = 1.73 CPU) @ 578034.68/s (n=1000000) foob0or1: 1 wallclock secs ( 1.96 usr + 0.00 sys = 1.96 CPU) @ 510204.08/s (n=1000000) foobOrNowt: 2 wallclock secs ( 1.99 usr + 0.02 sys = 2.01 CPU) @ 497512.44/s (n=1000000) Rate foobOrNowt foob0or1 c_foob0or1 c_foobOrNowt foobOrNowt 497512/s -- -2% -4% -14% foob0or1 510204/s 3% -- -2% -12% c_foob0or1 520833/s 5% 2% -- -10% c_foobOrNowt 578035/s 16% 13% 11% -- Code: #!/usr/bin/perl no warnings; use strict; use Benchmark qw(cmpthese); $::string = "foofoo catbar"; $::re_foobOrNowt = qr/(foob\|)foofoo/o; $::re_foob0or1 = qr/(foob)?foofoo/o; $::foobOrNowt = qr/(foob\|)foofoo/; $::foob0or1 = qr/(foob)?foofoo/; #study $::string; print 'After studying the searched string'.$/; cmpthese( 1000, { foobOrNowt => 'for (1..10000) { if ($string =~ $::foobOrNowt) { };}', foob0or1 => 'for (1..10000) { if ($string =~ $::foob0or1) { };}', c_foobOrNowt=> 'for (1..10000) { if ($string =~ $::re_foobOrNowt) { };}', c_foob0or1 => 'for (1..10000) { if ($string =~ $::re_foob0or1 ) { };}', }); Results ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 16 wallclock secs (15.97 usr + 0.00 sys = 15.97 CPU) @ 62.62/s (n=1000) c_foobOrNowt: 16 wallclock secs (16.25 usr + 0.00 sys = 16.25 CPU) @ 61.54/s (n=1000) foob0or1: 17 wallclock secs (16.34 usr + 0.00 sys = 16.34 CPU) @ 61.20/s (n=1000) foobOrNowt: 17 wallclock secs (17.32 usr + 0.00 sys = 17.32 CPU) @ 57.74/s (n=1000) Rate foobOrNowt foob0or1 c_foobOrNowt c_foob0or1 foobOrNowt 57.7/s -- -6% -6% -8% foob0or1 61.2/s 6% -- -1% -2% c_foobOrNowt 61.5/s 7% 1% -- -2% c_foob0or1 62.6/s 8% 2% 2% -- My tests show a performance increase in the compiled versions. Finally, here's your initial test (unaltered), run on my own machine for base-line comparison: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 2 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 1 wallclock secs ( 1.78 usr + 0.00 sys = 1.78 CPU) @ 561797.75/s (n=1000000) foob0or1: 0 wallclock secs ( 1.33 usr + 0.00 sys = 1.33 CPU) @ 751879.70/s (n=1000000) foobOrNowt: 1 wallclock secs ( 1.34 usr + 0.00 sys = 1.34 CPU) @ 746268.66/s (n=1000000) Rate c_foob0or1 c_foobOrNowt foobOrNowt foob0or1 c_foob0or1 520833/s -- -7% -30% -31% c_foobOrNowt 561798/s 8% -- -25% -25% foobOrNowt 746269/s 43% 33% -- -1% foob0or1 751880/s 44% 34% 1% -- These test results agree with your test results, supporting that your results are correct for the test you performed.	[reply]
Re: Re: Re: Difference between (foo\|) and (foo)? by Anonymous Monk on Sep 30, 2002 at 19:57 UTC
Sorry, forgot to format that. Finally, here's your initial test (unaltered), run on my own machine for base-line comparison: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 2 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 1 wallclock secs ( 1.78 usr + 0.00 sys = 1.78 CPU) @ 561797.75/s (n=1000000) foob0or1: 0 wallclock secs ( 1.33 usr + 0.00 sys = 1.33 CPU) @ 751879.70/s (n=1000000) foobOrNowt: 1 wallclock secs ( 1.34 usr + 0.00 sys = 1.34 CPU) @ 746268.66/s (n=1000000) Rate c_foob0or1 c_foobOrNowt foobOrNowt foob0or1 c_foob0or1 520833/s -- -7% -30% -31% c_foobOrNowt 561798/s 8% -- -25% -25% foobOrNowt 746269/s 43% 33% -- -1% foob0or1 751880/s 44% 34% 1% -- These test results agree with your test results, supporting that your results are correct for the test you performed.	[reply]