comment on

<html> All you've done in this test is change the weighting of the string to match against in favor of the (foo)? method, in contrast to my original string "foofoo catbar" which is quite possibly weighted in favor of (foo|) better performance. That said, I like the thought behind your approach.

I altered the test to benchmark each string seperately:


#!/usr/bin/perl -w

use strict;

use Benchmark 'cmpthese';

my @string = (
        "foofoo catbar",
        "foofoofoo catbar",
        "foo foo cat bar",
        "foo flew over the",
        "cufoofoo nest",
);

cmpthese(1500000, {
        foo_or_0 => sub { $string[0] =~ /^(foob|)foofoo/ },
        foo_or_1 => sub { $string1 =~ /^(foob|)foofoo/ },
        foo_or_2 => sub { $string2 =~ /^(foob|)foofoo/ },
        foo_or_3 => sub { $string3 =~ /^(foob|)foofoo/ },
        foo_or_4 => sub { $string4 =~ /^(foob|)foofoo/ },

        foo_qs_0 => sub { $string[0] =~ /^(foob)?foofoo/ },
        foo_qs_1 => sub { $string1 =~ /^(foob)?foofoo/ },
        foo_qs_2 => sub { $string2 =~ /^(foob)?foofoo/ },
        foo_qs_3 => sub { $string3 =~ /^(foob)?foofoo/ },
        foo_qs_4 => sub { $string4 =~ /^(foob)?foofoo/ },
});

If we benchmark the strings seperately, we get the following:


ddouville@linuxdld:~> ./test.pl
Benchmark: timing 1500000 iterations of foo_or_0, foo_or_1, foo_or_2, foo_or_3, foo_or_4, foo_qs_0, foo_qs_1, foo_qs_2, foo_qs_3, foo_qs_4...
  foo_or_0:  2 wallclock secs ( 2.21 usr +  0.00 sys =  2.21 CPU) @ 678733.03/s (n=1500000)
  foo_or_1:  3 wallclock secs ( 2.13 usr +  0.00 sys =  2.13 CPU) @ 704225.35/s (n=1500000)
  foo_or_2:  0 wallclock secs ( 0.72 usr +  0.00 sys =  0.72 CPU) @ 2083333.33/s (n=1500000)
  foo_or_3: -1 wallclock secs ( 0.50 usr + -0.01 sys =  0.49 CPU) @ 3061224.49/s (n=1500000)
  foo_or_4:  1 wallclock secs ( 1.26 usr +  0.00 sys =  1.26 CPU) @ 1190476.19/s (n=1500000)
  foo_qs_0:  1 wallclock secs ( 2.07 usr +  0.00 sys =  2.07 CPU) @ 724637.68/s (n=1500000)
  foo_qs_1:  2 wallclock secs ( 2.02 usr +  0.00 sys =  2.02 CPU) @ 742574.26/s (n=1500000)
  foo_qs_2:  0 wallclock secs ( 0.66 usr +  0.00 sys =  0.66 CPU) @ 2272727.27/s (n=1500000)
  foo_qs_3:  2 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 3061224.49/s (n=1500000)
  foo_qs_4:  2 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 1442307.69/s (n=1500000)
              Rate foo_or_0 foo_or_1 foo_qs_0 foo_qs_1 foo_or_4 foo_qs_4 foo_or_2 foo_qs_2 foo_qs_3 foo_or_3
foo_or_0  678733/s       --      -4%      -6%      -9%     -43%     -53%     -67%     -70%     -78%     -78%
foo_or_1  704225/s       4%       --      -3%      -5%     -41%     -51%     -66%     -69%     -77%     -77%
foo_qs_0  724638/s       7%       3%       --      -2%     -39%     -50%     -65%     -68%     -76%     -76%
foo_qs_1  742574/s       9%       5%       2%       --     -38%     -49%     -64%     -67%     -76%     -76%
foo_or_4 1190476/s      75%      69%      64%      60%       --     -17%     -43%     -48%     -61%     -61%
foo_qs_4 1442308/s     113%     105%      99%      94%      21%       --     -31%     -37%     -53%     -53%
foo_or_2 2083333/s     207%     196%     188%     181%      75%      44%       --      -8%     -32%     -32%
foo_qs_2 2272727/s     235%     223%     214%     206%      91%      58%       9%       --     -26%     -26%
foo_qs_3 3061224/s     351%     335%     322%     312%     157%     112%      47%      35%       --      -0%
foo_or_3 3061224/s     351%     335%     322%     312%     157%     112%      47%      35%       0%       --

Interesting results. benchmark reports the opposite for "foofoo catbar" (my original string) than UNIX 'time' command did. Am I reading the results correctly? I compared foo_or_N to foo_qs_N and this is what I have:

"foofoo catbar":          QS wins by -6%
"foofoofoo catbar":       QS wins by -5%
"foo foo cat bar":        QS wins by -8%
"foo flew over the":      OR and QS tie 0%
"cufoofoo nest":          QS wins by -17%

That leaves a question: is there a pattern situation where OR can win?

In reply to Re: Re: Difference between (foo|) and (foo)? by Anonymous Monk
in thread Difference between (foo|) and (foo)? by Derek2

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.