Difference between (foo|) and (foo)?

Derek2 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Difference between (foo\|) and (foo)? by tadman (Prior) on Sep 28, 2002 at 05:31 UTC
Okay, hold the phone. If you're going to compare things in a scientific manner, at least give the question mark variation a fighting chance: `#!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; my @string = ( "foofoo catbar", "foofoofoo catbar", "foo foo cat bar", "foo flew over the", "cufoofoo nest", ); cmpthese(50_000, { foo_or => sub { /^(foob\|)foofoo/ foreach (@string) }, foo_qs => sub { /^(foob)?foofoo/ foreach (@string) } });` [download] With that in mind, the results are different: `Benchmark: timing 50000 iterations of foo_or, foo_qs... foo_or: 2 wallclock secs ( 1.28 usr + 0.00 sys = 1.28 CPU) @ 39 +062.50/s (n=50000) foo_qs: 2 wallclock secs ( 1.21 usr + 0.00 sys = 1.21 CPU) @ 41 +322.31/s (n=50000) Rate foo_or foo_qs foo_or 39062/s -- -5% foo_qs 41322/s 6% --` [download] The variance changes, but the question mark wins every time.	[reply] [d/l] [select]
Re: Re: Difference between (foo\|) and (foo)? by traveler (Parson) on Sep 28, 2002 at 18:28 UTC
I added `foo_ro => sub { /^(\|foob)foofoo/ foreach (@string) }` (and upped the iterations) and got `Benchmark: timing 500000 iterations of foo_or, foo_qs, foo_ro... foo_or: 3 wallclock secs ( 2.55 usr + 0.00 sys = 2.55 CPU) @ 19 +6386.49/s (n=500000) foo_qs: 3 wallclock secs ( 2.45 usr + 0.00 sys = 2.45 CPU) @ 20 +3832.04/s (n=500000) foo_ro: 2 wallclock secs ( 2.39 usr + 0.00 sys = 2.39 CPU) @ 20 +9205.02/s (n=500000) Rate foo_or foo_qs foo_ro foo_or 196386/s -- -4% -6% foo_qs 203832/s 4% -- -3% foo_ro 209205/s 7% 3% --` [download] This shows the ro doing better than even the qs. I suppose that `(foob\|)` vs `(\|foob)` depends on the data(?). --traveler	[reply] [d/l] [select]
Re: Re: Re: Difference between (foo\|) and (foo)? by theorbtwo (Prior) on Sep 29, 2002 at 01:00 UTC
`(\|foob)` is the same as `(foob)??`; `(foob\|)` would be `(foob)?`, so comparing them isn't really fair. Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).	[reply] [d/l] [select]
Re: Re: Difference between (foo\|) and (foo)? by Anonymous Monk on Sep 30, 2002 at 19:07 UTC
<html> All you've done in this test is change the weighting of the string to match against in favor of the (foo)? method, in contrast to my original string "foofoo catbar" which is quite possibly weighted in favor of (foo\|) better performance. That said, I like the thought behind your approach. I altered the test to benchmark each string seperately: #!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; my @string = ( "foofoo catbar", "foofoofoo catbar", "foo foo cat bar", "foo flew over the", "cufoofoo nest", ); cmpthese(1500000, { foo_or_0 => sub { $string[0] =~ /^(foob\|)foofoo/ }, foo_or_1 => sub { $string1 =~ /^(foob\|)foofoo/ }, foo_or_2 => sub { $string2 =~ /^(foob\|)foofoo/ }, foo_or_3 => sub { $string3 =~ /^(foob\|)foofoo/ }, foo_or_4 => sub { $string4 =~ /^(foob\|)foofoo/ }, foo_qs_0 => sub { $string[0] =~ /^(foob)?foofoo/ }, foo_qs_1 => sub { $string1 =~ /^(foob)?foofoo/ }, foo_qs_2 => sub { $string2 =~ /^(foob)?foofoo/ }, foo_qs_3 => sub { $string3 =~ /^(foob)?foofoo/ }, foo_qs_4 => sub { $string4 =~ /^(foob)?foofoo/ }, }); If we benchmark the strings seperately, we get the following: ddouville@linuxdld:~> ./test.pl Benchmark: timing 1500000 iterations of foo_or_0, foo_or_1, foo_or_2, foo_or_3, foo_or_4, foo_qs_0, foo_qs_1, foo_qs_2, foo_qs_3, foo_qs_4... foo_or_0: 2 wallclock secs ( 2.21 usr + 0.00 sys = 2.21 CPU) @ 678733.03/s (n=1500000) foo_or_1: 3 wallclock secs ( 2.13 usr + 0.00 sys = 2.13 CPU) @ 704225.35/s (n=1500000) foo_or_2: 0 wallclock secs ( 0.72 usr + 0.00 sys = 0.72 CPU) @ 2083333.33/s (n=1500000) foo_or_3: -1 wallclock secs ( 0.50 usr + -0.01 sys = 0.49 CPU) @ 3061224.49/s (n=1500000) foo_or_4: 1 wallclock secs ( 1.26 usr + 0.00 sys = 1.26 CPU) @ 1190476.19/s (n=1500000) foo_qs_0: 1 wallclock secs ( 2.07 usr + 0.00 sys = 2.07 CPU) @ 724637.68/s (n=1500000) foo_qs_1: 2 wallclock secs ( 2.02 usr + 0.00 sys = 2.02 CPU) @ 742574.26/s (n=1500000) foo_qs_2: 0 wallclock secs ( 0.66 usr + 0.00 sys = 0.66 CPU) @ 2272727.27/s (n=1500000) foo_qs_3: 2 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 3061224.49/s (n=1500000) foo_qs_4: 2 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 1442307.69/s (n=1500000) Rate foo_or_0 foo_or_1 foo_qs_0 foo_qs_1 foo_or_4 foo_qs_4 foo_or_2 foo_qs_2 foo_qs_3 foo_or_3 foo_or_0 678733/s -- -4% -6% -9% -43% -53% -67% -70% -78% -78% foo_or_1 704225/s 4% -- -3% -5% -41% -51% -66% -69% -77% -77% foo_qs_0 724638/s 7% 3% -- -2% -39% -50% -65% -68% -76% -76% foo_qs_1 742574/s 9% 5% 2% -- -38% -49% -64% -67% -76% -76% foo_or_4 1190476/s 75% 69% 64% 60% -- -17% -43% -48% -61% -61% foo_qs_4 1442308/s 113% 105% 99% 94% 21% -- -31% -37% -53% -53% foo_or_2 2083333/s 207% 196% 188% 181% 75% 44% -- -8% -32% -32% foo_qs_2 2272727/s 235% 223% 214% 206% 91% 58% 9% -- -26% -26% foo_qs_3 3061224/s 351% 335% 322% 312% 157% 112% 47% 35% -- -0% foo_or_3 3061224/s 351% 335% 322% 312% 157% 112% 47% 35% 0% -- Interesting results. benchmark reports the opposite for "foofoo catbar" (my original string) than UNIX 'time' command did. Am I reading the results correctly? I compared foo_or_N to foo_qs_N and this is what I have: "foofoo catbar": QS wins by -6% "foofoofoo catbar": QS wins by -5% "foo foo cat bar": QS wins by -8% "foo flew over the": OR and QS tie 0% "cufoofoo nest": QS wins by -17% That leaves a question: is there a pattern situation where OR can win?	[reply]
Re: Difference between (foo\|) and (foo)? by BrowserUk (Patriarch) on Sep 28, 2002 at 06:52 UTC
This intigued me and I thought maybe the 7.5% difference was in the time it took to parse and/or compile the differences in the regexes. So I thought I'd benchmark the test with them pre-compiled. The results are very intriguing. Not only does the difference between the two compiled versions remain pretty much the same, if anything getting slightly bigger. The pre-compiled versions actually run substantially more slowly than their none pre-compiled counterparts? This is most extreme in the case of the (foob\|) version running close to 40% faster than its precompiled counterpart. I'd like to see the explanation behind them onions? Probably my test methodology at fault, but I can't see it. It took that a stage further and applied study to the searched string. This resulted in a speed-up of the slowest (precompiled (foob)?) and the fastest (the non-precompiled (foob\|)), but consistantly slowed the other two varients down. Intriguing indeed. The test code and results are below Read more... (4 kB)	[reply] [d/l]
Re: Re: Difference between (foo\|) and (foo)? by Anonymous Monk on Sep 30, 2002 at 19:53 UTC
You went to a bit of trouble to analyse this, so I thought I'd repay you (hey, I work in QA all day -- test methodology is something I should know about!). At the end of this post I show that your test results are valid, but my conclusion is that your method is wrong. In your test, the pre-compiled optimized expression is contained in a variable which means it has to be pushed onto the stack and interpolated. The non-optimized expression was inline. So I changed a few things to make this more consistent and got different results. I put the non-optimized regular expressions into variables to keep the test consistent (so it has to push a variable onto the stack and interpolate it and all that jazz). Here's the code: #!/usr/bin/perl no warnings; use strict; use Benchmark qw(cmpthese); $::string = "foofoo catbar"; $::re_foobOrNowt = qr/(foob\|)foofoo/o; $::re_foob0or1 = qr/(foob)?foofoo/o; $::foobOrNowt = qr/(foob\|)foofoo/; $::foob0or1 = qr/(foob)?foofoo/; #study $::string; print 'After studying the searched string'.$/; cmpthese( 1000000, { foobOrNowt => 'if ($string =~ $::foobOrNowt) { };', foob0or1 => 'if ($string =~ $::foob0or1) { };', c_foobOrNowt=> 'if ($string =~ $::re_foobOrNowt) { };', c_foob0or1 => 'if ($string =~ $::re_foob0or1 ) { };', }); Here's the results: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 3 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 2 wallclock secs ( 1.73 usr + 0.00 sys = 1.73 CPU) @ 578034.68/s (n=1000000) foob0or1: 1 wallclock secs ( 1.96 usr + 0.00 sys = 1.96 CPU) @ 510204.08/s (n=1000000) foobOrNowt: 2 wallclock secs ( 1.99 usr + 0.02 sys = 2.01 CPU) @ 497512.44/s (n=1000000) Rate foobOrNowt foob0or1 c_foob0or1 c_foobOrNowt foobOrNowt 497512/s -- -2% -4% -14% foob0or1 510204/s 3% -- -2% -12% c_foob0or1 520833/s 5% 2% -- -10% c_foobOrNowt 578035/s 16% 13% 11% -- Code: #!/usr/bin/perl no warnings; use strict; use Benchmark qw(cmpthese); $::string = "foofoo catbar"; $::re_foobOrNowt = qr/(foob\|)foofoo/o; $::re_foob0or1 = qr/(foob)?foofoo/o; $::foobOrNowt = qr/(foob\|)foofoo/; $::foob0or1 = qr/(foob)?foofoo/; #study $::string; print 'After studying the searched string'.$/; cmpthese( 1000, { foobOrNowt => 'for (1..10000) { if ($string =~ $::foobOrNowt) { };}', foob0or1 => 'for (1..10000) { if ($string =~ $::foob0or1) { };}', c_foobOrNowt=> 'for (1..10000) { if ($string =~ $::re_foobOrNowt) { };}', c_foob0or1 => 'for (1..10000) { if ($string =~ $::re_foob0or1 ) { };}', }); Results ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 16 wallclock secs (15.97 usr + 0.00 sys = 15.97 CPU) @ 62.62/s (n=1000) c_foobOrNowt: 16 wallclock secs (16.25 usr + 0.00 sys = 16.25 CPU) @ 61.54/s (n=1000) foob0or1: 17 wallclock secs (16.34 usr + 0.00 sys = 16.34 CPU) @ 61.20/s (n=1000) foobOrNowt: 17 wallclock secs (17.32 usr + 0.00 sys = 17.32 CPU) @ 57.74/s (n=1000) Rate foobOrNowt foob0or1 c_foobOrNowt c_foob0or1 foobOrNowt 57.7/s -- -6% -6% -8% foob0or1 61.2/s 6% -- -1% -2% c_foobOrNowt 61.5/s 7% 1% -- -2% c_foob0or1 62.6/s 8% 2% 2% -- My tests show a performance increase in the compiled versions. Finally, here's your initial test (unaltered), run on my own machine for base-line comparison: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 2 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 1 wallclock secs ( 1.78 usr + 0.00 sys = 1.78 CPU) @ 561797.75/s (n=1000000) foob0or1: 0 wallclock secs ( 1.33 usr + 0.00 sys = 1.33 CPU) @ 751879.70/s (n=1000000) foobOrNowt: 1 wallclock secs ( 1.34 usr + 0.00 sys = 1.34 CPU) @ 746268.66/s (n=1000000) Rate c_foob0or1 c_foobOrNowt foobOrNowt foob0or1 c_foob0or1 520833/s -- -7% -30% -31% c_foobOrNowt 561798/s 8% -- -25% -25% foobOrNowt 746269/s 43% 33% -- -1% foob0or1 751880/s 44% 34% 1% -- These test results agree with your test results, supporting that your results are correct for the test you performed.	[reply]
Re: Re: Re: Difference between (foo\|) and (foo)? by Anonymous Monk on Sep 30, 2002 at 19:57 UTC
Sorry, forgot to format that. Finally, here's your initial test (unaltered), run on my own machine for base-line comparison: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 2 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 1 wallclock secs ( 1.78 usr + 0.00 sys = 1.78 CPU) @ 561797.75/s (n=1000000) foob0or1: 0 wallclock secs ( 1.33 usr + 0.00 sys = 1.33 CPU) @ 751879.70/s (n=1000000) foobOrNowt: 1 wallclock secs ( 1.34 usr + 0.00 sys = 1.34 CPU) @ 746268.66/s (n=1000000) Rate c_foob0or1 c_foobOrNowt foobOrNowt foob0or1 c_foob0or1 520833/s -- -7% -30% -31% c_foobOrNowt 561798/s 8% -- -25% -25% foobOrNowt 746269/s 43% 33% -- -1% foob0or1 751880/s 44% 34% 1% -- These test results agree with your test results, supporting that your results are correct for the test you performed.	[reply]

Replies are listed 'Best First'.
Re: Difference between (foo\|) and (foo)? by tadman (Prior) on Sep 28, 2002 at 05:31 UTC
Okay, hold the phone. If you're going to compare things in a scientific manner, at least give the question mark variation a fighting chance: `#!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; my @string = ( "foofoo catbar", "foofoofoo catbar", "foo foo cat bar", "foo flew over the", "cufoofoo nest", ); cmpthese(50_000, { foo_or => sub { /^(foob\|)foofoo/ foreach (@string) }, foo_qs => sub { /^(foob)?foofoo/ foreach (@string) } });` [download] With that in mind, the results are different: `Benchmark: timing 50000 iterations of foo_or, foo_qs... foo_or: 2 wallclock secs ( 1.28 usr + 0.00 sys = 1.28 CPU) @ 39 +062.50/s (n=50000) foo_qs: 2 wallclock secs ( 1.21 usr + 0.00 sys = 1.21 CPU) @ 41 +322.31/s (n=50000) Rate foo_or foo_qs foo_or 39062/s -- -5% foo_qs 41322/s 6% --` [download] The variance changes, but the question mark wins every time.	[reply] [d/l] [select]
Re: Re: Difference between (foo\|) and (foo)? by traveler (Parson) on Sep 28, 2002 at 18:28 UTC
I added `foo_ro => sub { /^(\|foob)foofoo/ foreach (@string) }` (and upped the iterations) and got `Benchmark: timing 500000 iterations of foo_or, foo_qs, foo_ro... foo_or: 3 wallclock secs ( 2.55 usr + 0.00 sys = 2.55 CPU) @ 19 +6386.49/s (n=500000) foo_qs: 3 wallclock secs ( 2.45 usr + 0.00 sys = 2.45 CPU) @ 20 +3832.04/s (n=500000) foo_ro: 2 wallclock secs ( 2.39 usr + 0.00 sys = 2.39 CPU) @ 20 +9205.02/s (n=500000) Rate foo_or foo_qs foo_ro foo_or 196386/s -- -4% -6% foo_qs 203832/s 4% -- -3% foo_ro 209205/s 7% 3% --` [download] This shows the ro doing better than even the qs. I suppose that `(foob\|)` vs `(\|foob)` depends on the data(?). --traveler	[reply] [d/l] [select]
Re: Re: Re: Difference between (foo\|) and (foo)? by theorbtwo (Prior) on Sep 29, 2002 at 01:00 UTC
`(\|foob)` is the same as `(foob)??`; `(foob\|)` would be `(foob)?`, so comparing them isn't really fair. Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).	[reply] [d/l] [select]
Re: Re: Difference between (foo\|) and (foo)? by Anonymous Monk on Sep 30, 2002 at 19:07 UTC
<html> All you've done in this test is change the weighting of the string to match against in favor of the (foo)? method, in contrast to my original string "foofoo catbar" which is quite possibly weighted in favor of (foo\|) better performance. That said, I like the thought behind your approach. I altered the test to benchmark each string seperately: #!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; my @string = ( "foofoo catbar", "foofoofoo catbar", "foo foo cat bar", "foo flew over the", "cufoofoo nest", ); cmpthese(1500000, { foo_or_0 => sub { $string[0] =~ /^(foob\|)foofoo/ }, foo_or_1 => sub { $string1 =~ /^(foob\|)foofoo/ }, foo_or_2 => sub { $string2 =~ /^(foob\|)foofoo/ }, foo_or_3 => sub { $string3 =~ /^(foob\|)foofoo/ }, foo_or_4 => sub { $string4 =~ /^(foob\|)foofoo/ }, foo_qs_0 => sub { $string[0] =~ /^(foob)?foofoo/ }, foo_qs_1 => sub { $string1 =~ /^(foob)?foofoo/ }, foo_qs_2 => sub { $string2 =~ /^(foob)?foofoo/ }, foo_qs_3 => sub { $string3 =~ /^(foob)?foofoo/ }, foo_qs_4 => sub { $string4 =~ /^(foob)?foofoo/ }, }); If we benchmark the strings seperately, we get the following: ddouville@linuxdld:~> ./test.pl Benchmark: timing 1500000 iterations of foo_or_0, foo_or_1, foo_or_2, foo_or_3, foo_or_4, foo_qs_0, foo_qs_1, foo_qs_2, foo_qs_3, foo_qs_4... foo_or_0: 2 wallclock secs ( 2.21 usr + 0.00 sys = 2.21 CPU) @ 678733.03/s (n=1500000) foo_or_1: 3 wallclock secs ( 2.13 usr + 0.00 sys = 2.13 CPU) @ 704225.35/s (n=1500000) foo_or_2: 0 wallclock secs ( 0.72 usr + 0.00 sys = 0.72 CPU) @ 2083333.33/s (n=1500000) foo_or_3: -1 wallclock secs ( 0.50 usr + -0.01 sys = 0.49 CPU) @ 3061224.49/s (n=1500000) foo_or_4: 1 wallclock secs ( 1.26 usr + 0.00 sys = 1.26 CPU) @ 1190476.19/s (n=1500000) foo_qs_0: 1 wallclock secs ( 2.07 usr + 0.00 sys = 2.07 CPU) @ 724637.68/s (n=1500000) foo_qs_1: 2 wallclock secs ( 2.02 usr + 0.00 sys = 2.02 CPU) @ 742574.26/s (n=1500000) foo_qs_2: 0 wallclock secs ( 0.66 usr + 0.00 sys = 0.66 CPU) @ 2272727.27/s (n=1500000) foo_qs_3: 2 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 3061224.49/s (n=1500000) foo_qs_4: 2 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 1442307.69/s (n=1500000) Rate foo_or_0 foo_or_1 foo_qs_0 foo_qs_1 foo_or_4 foo_qs_4 foo_or_2 foo_qs_2 foo_qs_3 foo_or_3 foo_or_0 678733/s -- -4% -6% -9% -43% -53% -67% -70% -78% -78% foo_or_1 704225/s 4% -- -3% -5% -41% -51% -66% -69% -77% -77% foo_qs_0 724638/s 7% 3% -- -2% -39% -50% -65% -68% -76% -76% foo_qs_1 742574/s 9% 5% 2% -- -38% -49% -64% -67% -76% -76% foo_or_4 1190476/s 75% 69% 64% 60% -- -17% -43% -48% -61% -61% foo_qs_4 1442308/s 113% 105% 99% 94% 21% -- -31% -37% -53% -53% foo_or_2 2083333/s 207% 196% 188% 181% 75% 44% -- -8% -32% -32% foo_qs_2 2272727/s 235% 223% 214% 206% 91% 58% 9% -- -26% -26% foo_qs_3 3061224/s 351% 335% 322% 312% 157% 112% 47% 35% -- -0% foo_or_3 3061224/s 351% 335% 322% 312% 157% 112% 47% 35% 0% -- Interesting results. benchmark reports the opposite for "foofoo catbar" (my original string) than UNIX 'time' command did. Am I reading the results correctly? I compared foo_or_N to foo_qs_N and this is what I have: "foofoo catbar": QS wins by -6% "foofoofoo catbar": QS wins by -5% "foo foo cat bar": QS wins by -8% "foo flew over the": OR and QS tie 0% "cufoofoo nest": QS wins by -17% That leaves a question: is there a pattern situation where OR can win?	[reply]
Re: Difference between (foo\|) and (foo)? by BrowserUk (Patriarch) on Sep 28, 2002 at 06:52 UTC
This intigued me and I thought maybe the 7.5% difference was in the time it took to parse and/or compile the differences in the regexes. So I thought I'd benchmark the test with them pre-compiled. The results are very intriguing. Not only does the difference between the two compiled versions remain pretty much the same, if anything getting slightly bigger. The pre-compiled versions actually run substantially more slowly than their none pre-compiled counterparts? This is most extreme in the case of the (foob\|) version running close to 40% faster than its precompiled counterpart. I'd like to see the explanation behind them onions? Probably my test methodology at fault, but I can't see it. It took that a stage further and applied study to the searched string. This resulted in a speed-up of the slowest (precompiled (foob)?) and the fastest (the non-precompiled (foob\|)), but consistantly slowed the other two varients down. Intriguing indeed. The test code and results are below Read more... (4 kB)	[reply] [d/l]
Re: Re: Difference between (foo\|) and (foo)? by Anonymous Monk on Sep 30, 2002 at 19:53 UTC
You went to a bit of trouble to analyse this, so I thought I'd repay you (hey, I work in QA all day -- test methodology is something I should know about!). At the end of this post I show that your test results are valid, but my conclusion is that your method is wrong. In your test, the pre-compiled optimized expression is contained in a variable which means it has to be pushed onto the stack and interpolated. The non-optimized expression was inline. So I changed a few things to make this more consistent and got different results. I put the non-optimized regular expressions into variables to keep the test consistent (so it has to push a variable onto the stack and interpolate it and all that jazz). Here's the code: #!/usr/bin/perl no warnings; use strict; use Benchmark qw(cmpthese); $::string = "foofoo catbar"; $::re_foobOrNowt = qr/(foob\|)foofoo/o; $::re_foob0or1 = qr/(foob)?foofoo/o; $::foobOrNowt = qr/(foob\|)foofoo/; $::foob0or1 = qr/(foob)?foofoo/; #study $::string; print 'After studying the searched string'.$/; cmpthese( 1000000, { foobOrNowt => 'if ($string =~ $::foobOrNowt) { };', foob0or1 => 'if ($string =~ $::foob0or1) { };', c_foobOrNowt=> 'if ($string =~ $::re_foobOrNowt) { };', c_foob0or1 => 'if ($string =~ $::re_foob0or1 ) { };', }); Here's the results: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 3 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 2 wallclock secs ( 1.73 usr + 0.00 sys = 1.73 CPU) @ 578034.68/s (n=1000000) foob0or1: 1 wallclock secs ( 1.96 usr + 0.00 sys = 1.96 CPU) @ 510204.08/s (n=1000000) foobOrNowt: 2 wallclock secs ( 1.99 usr + 0.02 sys = 2.01 CPU) @ 497512.44/s (n=1000000) Rate foobOrNowt foob0or1 c_foob0or1 c_foobOrNowt foobOrNowt 497512/s -- -2% -4% -14% foob0or1 510204/s 3% -- -2% -12% c_foob0or1 520833/s 5% 2% -- -10% c_foobOrNowt 578035/s 16% 13% 11% -- Code: #!/usr/bin/perl no warnings; use strict; use Benchmark qw(cmpthese); $::string = "foofoo catbar"; $::re_foobOrNowt = qr/(foob\|)foofoo/o; $::re_foob0or1 = qr/(foob)?foofoo/o; $::foobOrNowt = qr/(foob\|)foofoo/; $::foob0or1 = qr/(foob)?foofoo/; #study $::string; print 'After studying the searched string'.$/; cmpthese( 1000, { foobOrNowt => 'for (1..10000) { if ($string =~ $::foobOrNowt) { };}', foob0or1 => 'for (1..10000) { if ($string =~ $::foob0or1) { };}', c_foobOrNowt=> 'for (1..10000) { if ($string =~ $::re_foobOrNowt) { };}', c_foob0or1 => 'for (1..10000) { if ($string =~ $::re_foob0or1 ) { };}', }); Results ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 16 wallclock secs (15.97 usr + 0.00 sys = 15.97 CPU) @ 62.62/s (n=1000) c_foobOrNowt: 16 wallclock secs (16.25 usr + 0.00 sys = 16.25 CPU) @ 61.54/s (n=1000) foob0or1: 17 wallclock secs (16.34 usr + 0.00 sys = 16.34 CPU) @ 61.20/s (n=1000) foobOrNowt: 17 wallclock secs (17.32 usr + 0.00 sys = 17.32 CPU) @ 57.74/s (n=1000) Rate foobOrNowt foob0or1 c_foobOrNowt c_foob0or1 foobOrNowt 57.7/s -- -6% -6% -8% foob0or1 61.2/s 6% -- -1% -2% c_foobOrNowt 61.5/s 7% 1% -- -2% c_foob0or1 62.6/s 8% 2% 2% -- My tests show a performance increase in the compiled versions. Finally, here's your initial test (unaltered), run on my own machine for base-line comparison: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 2 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 1 wallclock secs ( 1.78 usr + 0.00 sys = 1.78 CPU) @ 561797.75/s (n=1000000) foob0or1: 0 wallclock secs ( 1.33 usr + 0.00 sys = 1.33 CPU) @ 751879.70/s (n=1000000) foobOrNowt: 1 wallclock secs ( 1.34 usr + 0.00 sys = 1.34 CPU) @ 746268.66/s (n=1000000) Rate c_foob0or1 c_foobOrNowt foobOrNowt foob0or1 c_foob0or1 520833/s -- -7% -30% -31% c_foobOrNowt 561798/s 8% -- -25% -25% foobOrNowt 746269/s 43% 33% -- -1% foob0or1 751880/s 44% 34% 1% -- These test results agree with your test results, supporting that your results are correct for the test you performed.	[reply]
Re: Re: Re: Difference between (foo\|) and (foo)? by Anonymous Monk on Sep 30, 2002 at 19:57 UTC
Sorry, forgot to format that. Finally, here's your initial test (unaltered), run on my own machine for base-line comparison: ddouville@linuxdld:~> ./test2.pl Benchmark: timing 1000000 iterations of c_foob0or1, c_foobOrNowt, foob0or1, foobOrNowt... c_foob0or1: 2 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 520833.33/s (n=1000000) c_foobOrNowt: 1 wallclock secs ( 1.78 usr + 0.00 sys = 1.78 CPU) @ 561797.75/s (n=1000000) foob0or1: 0 wallclock secs ( 1.33 usr + 0.00 sys = 1.33 CPU) @ 751879.70/s (n=1000000) foobOrNowt: 1 wallclock secs ( 1.34 usr + 0.00 sys = 1.34 CPU) @ 746268.66/s (n=1000000) Rate c_foob0or1 c_foobOrNowt foobOrNowt foob0or1 c_foob0or1 520833/s -- -7% -30% -31% c_foobOrNowt 561798/s 8% -- -25% -25% foobOrNowt 746269/s 43% 33% -- -1% foob0or1 751880/s 44% 34% 1% -- These test results agree with your test results, supporting that your results are correct for the test you performed.	[reply]