Re: Re: Difference between (foo|) and (foo)?

<html> All you've done in this test is change the weighting of the string to match against in favor of the (foo)? method, in contrast to my original string "foofoo catbar" which is quite possibly weighted in favor of (foo|) better performance. That said, I like the thought behind your approach.

I altered the test to benchmark each string seperately:


#!/usr/bin/perl -w

use strict;

use Benchmark 'cmpthese';

my @string = (
        "foofoo catbar",
        "foofoofoo catbar",
        "foo foo cat bar",
        "foo flew over the",
        "cufoofoo nest",
);

cmpthese(1500000, {
        foo_or_0 => sub { $string[0] =~ /^(foob|)foofoo/ },
        foo_or_1 => sub { $string1 =~ /^(foob|)foofoo/ },
        foo_or_2 => sub { $string2 =~ /^(foob|)foofoo/ },
        foo_or_3 => sub { $string3 =~ /^(foob|)foofoo/ },
        foo_or_4 => sub { $string4 =~ /^(foob|)foofoo/ },

        foo_qs_0 => sub { $string[0] =~ /^(foob)?foofoo/ },
        foo_qs_1 => sub { $string1 =~ /^(foob)?foofoo/ },
        foo_qs_2 => sub { $string2 =~ /^(foob)?foofoo/ },
        foo_qs_3 => sub { $string3 =~ /^(foob)?foofoo/ },
        foo_qs_4 => sub { $string4 =~ /^(foob)?foofoo/ },
});

If we benchmark the strings seperately, we get the following:


ddouville@linuxdld:~> ./test.pl
Benchmark: timing 1500000 iterations of foo_or_0, foo_or_1, foo_or_2, foo_or_3, foo_or_4, foo_qs_0, foo_qs_1, foo_qs_2, foo_qs_3, foo_qs_4...
  foo_or_0:  2 wallclock secs ( 2.21 usr +  0.00 sys =  2.21 CPU) @ 678733.03/s (n=1500000)
  foo_or_1:  3 wallclock secs ( 2.13 usr +  0.00 sys =  2.13 CPU) @ 704225.35/s (n=1500000)
  foo_or_2:  0 wallclock secs ( 0.72 usr +  0.00 sys =  0.72 CPU) @ 2083333.33/s (n=1500000)
  foo_or_3: -1 wallclock secs ( 0.50 usr + -0.01 sys =  0.49 CPU) @ 3061224.49/s (n=1500000)
  foo_or_4:  1 wallclock secs ( 1.26 usr +  0.00 sys =  1.26 CPU) @ 1190476.19/s (n=1500000)
  foo_qs_0:  1 wallclock secs ( 2.07 usr +  0.00 sys =  2.07 CPU) @ 724637.68/s (n=1500000)
  foo_qs_1:  2 wallclock secs ( 2.02 usr +  0.00 sys =  2.02 CPU) @ 742574.26/s (n=1500000)
  foo_qs_2:  0 wallclock secs ( 0.66 usr +  0.00 sys =  0.66 CPU) @ 2272727.27/s (n=1500000)
  foo_qs_3:  2 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 3061224.49/s (n=1500000)
  foo_qs_4:  2 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 1442307.69/s (n=1500000)
              Rate foo_or_0 foo_or_1 foo_qs_0 foo_qs_1 foo_or_4 foo_qs_4 foo_or_2 foo_qs_2 foo_qs_3 foo_or_3
foo_or_0  678733/s       --      -4%      -6%      -9%     -43%     -53%     -67%     -70%     -78%     -78%
foo_or_1  704225/s       4%       --      -3%      -5%     -41%     -51%     -66%     -69%     -77%     -77%
foo_qs_0  724638/s       7%       3%       --      -2%     -39%     -50%     -65%     -68%     -76%     -76%
foo_qs_1  742574/s       9%       5%       2%       --     -38%     -49%     -64%     -67%     -76%     -76%
foo_or_4 1190476/s      75%      69%      64%      60%       --     -17%     -43%     -48%     -61%     -61%
foo_qs_4 1442308/s     113%     105%      99%      94%      21%       --     -31%     -37%     -53%     -53%
foo_or_2 2083333/s     207%     196%     188%     181%      75%      44%       --      -8%     -32%     -32%
foo_qs_2 2272727/s     235%     223%     214%     206%      91%      58%       9%       --     -26%     -26%
foo_qs_3 3061224/s     351%     335%     322%     312%     157%     112%      47%      35%       --      -0%
foo_or_3 3061224/s     351%     335%     322%     312%     157%     112%      47%      35%       0%       --

Interesting results. benchmark reports the opposite for "foofoo catbar" (my original string) than UNIX 'time' command did. Am I reading the results correctly? I compared foo_or_N to foo_qs_N and this is what I have:

"foofoo catbar":          QS wins by -6%
"foofoofoo catbar":       QS wins by -5%
"foo foo cat bar":        QS wins by -8%
"foo flew over the":      OR and QS tie 0%
"cufoofoo nest":          QS wins by -17%

That leaves a question: is there a pattern situation where OR can win?

Comment on Re: Re: Difference between (foo\|) and (foo)?