Re: Difference between (foo|) and (foo)?

Okay, hold the phone. If you're going to compare things in a scientific manner, at least give the question mark variation a fighting chance:

#!/usr/bin/perl -w

use strict;

use Benchmark 'cmpthese';

my @string = (
        "foofoo catbar",
        "foofoofoo catbar",
        "foo foo cat bar",
        "foo flew over the",
        "cufoofoo nest",
);

cmpthese(50_000, {
        foo_or => sub { /^(foob|)foofoo/ foreach (@string) },
        foo_qs => sub { /^(foob)?foofoo/ foreach (@string) }
});
[download]

With that in mind, the results are different:

Benchmark: timing 50000 iterations of foo_or, foo_qs...
    foo_or:  2 wallclock secs ( 1.28 usr +  0.00 sys =  1.28 CPU) @ 39
+062.50/s (n=50000)
    foo_qs:  2 wallclock secs ( 1.21 usr +  0.00 sys =  1.21 CPU) @ 41
+322.31/s (n=50000)
          Rate foo_or foo_qs
foo_or 39062/s     --    -5%
foo_qs 41322/s     6%     --
[download]

The variance changes, but the question mark wins every time.

Comment on Re: Difference between (foo\|) and (foo)? Select or Download Code

Replies are listed 'Best First'.
Re: Re: Difference between (foo\|) and (foo)? by traveler (Parson) on Sep 28, 2002 at 18:28 UTC
I added `foo_ro => sub { /^(\|foob)foofoo/ foreach (@string) }` (and upped the iterations) and got `Benchmark: timing 500000 iterations of foo_or, foo_qs, foo_ro... foo_or: 3 wallclock secs ( 2.55 usr + 0.00 sys = 2.55 CPU) @ 19 +6386.49/s (n=500000) foo_qs: 3 wallclock secs ( 2.45 usr + 0.00 sys = 2.45 CPU) @ 20 +3832.04/s (n=500000) foo_ro: 2 wallclock secs ( 2.39 usr + 0.00 sys = 2.39 CPU) @ 20 +9205.02/s (n=500000) Rate foo_or foo_qs foo_ro foo_or 196386/s -- -4% -6% foo_qs 203832/s 4% -- -3% foo_ro 209205/s 7% 3% --` [download] This shows the ro doing better than even the qs. I suppose that `(foob\|)` vs `(\|foob)` depends on the data(?). --traveler	[reply] [d/l] [select]
Re: Re: Re: Difference between (foo\|) and (foo)? by theorbtwo (Prior) on Sep 29, 2002 at 01:00 UTC
`(\|foob)` is the same as `(foob)??`; `(foob\|)` would be `(foob)?`, so comparing them isn't really fair. Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).	[reply] [d/l] [select]
Re: Re: Difference between (foo\|) and (foo)? by Anonymous Monk on Sep 30, 2002 at 19:07 UTC
<html> All you've done in this test is change the weighting of the string to match against in favor of the (foo)? method, in contrast to my original string "foofoo catbar" which is quite possibly weighted in favor of (foo\|) better performance. That said, I like the thought behind your approach. I altered the test to benchmark each string seperately: #!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; my @string = ( "foofoo catbar", "foofoofoo catbar", "foo foo cat bar", "foo flew over the", "cufoofoo nest", ); cmpthese(1500000, { foo_or_0 => sub { $string[0] =~ /^(foob\|)foofoo/ }, foo_or_1 => sub { $string1 =~ /^(foob\|)foofoo/ }, foo_or_2 => sub { $string2 =~ /^(foob\|)foofoo/ }, foo_or_3 => sub { $string3 =~ /^(foob\|)foofoo/ }, foo_or_4 => sub { $string4 =~ /^(foob\|)foofoo/ }, foo_qs_0 => sub { $string[0] =~ /^(foob)?foofoo/ }, foo_qs_1 => sub { $string1 =~ /^(foob)?foofoo/ }, foo_qs_2 => sub { $string2 =~ /^(foob)?foofoo/ }, foo_qs_3 => sub { $string3 =~ /^(foob)?foofoo/ }, foo_qs_4 => sub { $string4 =~ /^(foob)?foofoo/ }, }); If we benchmark the strings seperately, we get the following: ddouville@linuxdld:~> ./test.pl Benchmark: timing 1500000 iterations of foo_or_0, foo_or_1, foo_or_2, foo_or_3, foo_or_4, foo_qs_0, foo_qs_1, foo_qs_2, foo_qs_3, foo_qs_4... foo_or_0: 2 wallclock secs ( 2.21 usr + 0.00 sys = 2.21 CPU) @ 678733.03/s (n=1500000) foo_or_1: 3 wallclock secs ( 2.13 usr + 0.00 sys = 2.13 CPU) @ 704225.35/s (n=1500000) foo_or_2: 0 wallclock secs ( 0.72 usr + 0.00 sys = 0.72 CPU) @ 2083333.33/s (n=1500000) foo_or_3: -1 wallclock secs ( 0.50 usr + -0.01 sys = 0.49 CPU) @ 3061224.49/s (n=1500000) foo_or_4: 1 wallclock secs ( 1.26 usr + 0.00 sys = 1.26 CPU) @ 1190476.19/s (n=1500000) foo_qs_0: 1 wallclock secs ( 2.07 usr + 0.00 sys = 2.07 CPU) @ 724637.68/s (n=1500000) foo_qs_1: 2 wallclock secs ( 2.02 usr + 0.00 sys = 2.02 CPU) @ 742574.26/s (n=1500000) foo_qs_2: 0 wallclock secs ( 0.66 usr + 0.00 sys = 0.66 CPU) @ 2272727.27/s (n=1500000) foo_qs_3: 2 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 3061224.49/s (n=1500000) foo_qs_4: 2 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 1442307.69/s (n=1500000) Rate foo_or_0 foo_or_1 foo_qs_0 foo_qs_1 foo_or_4 foo_qs_4 foo_or_2 foo_qs_2 foo_qs_3 foo_or_3 foo_or_0 678733/s -- -4% -6% -9% -43% -53% -67% -70% -78% -78% foo_or_1 704225/s 4% -- -3% -5% -41% -51% -66% -69% -77% -77% foo_qs_0 724638/s 7% 3% -- -2% -39% -50% -65% -68% -76% -76% foo_qs_1 742574/s 9% 5% 2% -- -38% -49% -64% -67% -76% -76% foo_or_4 1190476/s 75% 69% 64% 60% -- -17% -43% -48% -61% -61% foo_qs_4 1442308/s 113% 105% 99% 94% 21% -- -31% -37% -53% -53% foo_or_2 2083333/s 207% 196% 188% 181% 75% 44% -- -8% -32% -32% foo_qs_2 2272727/s 235% 223% 214% 206% 91% 58% 9% -- -26% -26% foo_qs_3 3061224/s 351% 335% 322% 312% 157% 112% 47% 35% -- -0% foo_or_3 3061224/s 351% 335% 322% 312% 157% 112% 47% 35% 0% -- Interesting results. benchmark reports the opposite for "foofoo catbar" (my original string) than UNIX 'time' command did. Am I reading the results correctly? I compared foo_or_N to foo_qs_N and this is what I have: "foofoo catbar": QS wins by -6% "foofoofoo catbar": QS wins by -5% "foo foo cat bar": QS wins by -8% "foo flew over the": OR and QS tie 0% "cufoofoo nest": QS wins by -17% That leaves a question: is there a pattern situation where OR can win?	[reply]

Replies are listed 'Best First'.
Re: Re: Difference between (foo\|) and (foo)? by traveler (Parson) on Sep 28, 2002 at 18:28 UTC
I added `foo_ro => sub { /^(\|foob)foofoo/ foreach (@string) }` (and upped the iterations) and got `Benchmark: timing 500000 iterations of foo_or, foo_qs, foo_ro... foo_or: 3 wallclock secs ( 2.55 usr + 0.00 sys = 2.55 CPU) @ 19 +6386.49/s (n=500000) foo_qs: 3 wallclock secs ( 2.45 usr + 0.00 sys = 2.45 CPU) @ 20 +3832.04/s (n=500000) foo_ro: 2 wallclock secs ( 2.39 usr + 0.00 sys = 2.39 CPU) @ 20 +9205.02/s (n=500000) Rate foo_or foo_qs foo_ro foo_or 196386/s -- -4% -6% foo_qs 203832/s 4% -- -3% foo_ro 209205/s 7% 3% --` [download] This shows the ro doing better than even the qs. I suppose that `(foob\|)` vs `(\|foob)` depends on the data(?). --traveler	[reply] [d/l] [select]
Re: Re: Re: Difference between (foo\|) and (foo)? by theorbtwo (Prior) on Sep 29, 2002 at 01:00 UTC
`(\|foob)` is the same as `(foob)??`; `(foob\|)` would be `(foob)?`, so comparing them isn't really fair. Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).	[reply] [d/l] [select]
Re: Re: Difference between (foo\|) and (foo)? by Anonymous Monk on Sep 30, 2002 at 19:07 UTC
<html> All you've done in this test is change the weighting of the string to match against in favor of the (foo)? method, in contrast to my original string "foofoo catbar" which is quite possibly weighted in favor of (foo\|) better performance. That said, I like the thought behind your approach. I altered the test to benchmark each string seperately: #!/usr/bin/perl -w use strict; use Benchmark 'cmpthese'; my @string = ( "foofoo catbar", "foofoofoo catbar", "foo foo cat bar", "foo flew over the", "cufoofoo nest", ); cmpthese(1500000, { foo_or_0 => sub { $string[0] =~ /^(foob\|)foofoo/ }, foo_or_1 => sub { $string1 =~ /^(foob\|)foofoo/ }, foo_or_2 => sub { $string2 =~ /^(foob\|)foofoo/ }, foo_or_3 => sub { $string3 =~ /^(foob\|)foofoo/ }, foo_or_4 => sub { $string4 =~ /^(foob\|)foofoo/ }, foo_qs_0 => sub { $string[0] =~ /^(foob)?foofoo/ }, foo_qs_1 => sub { $string1 =~ /^(foob)?foofoo/ }, foo_qs_2 => sub { $string2 =~ /^(foob)?foofoo/ }, foo_qs_3 => sub { $string3 =~ /^(foob)?foofoo/ }, foo_qs_4 => sub { $string4 =~ /^(foob)?foofoo/ }, }); If we benchmark the strings seperately, we get the following: ddouville@linuxdld:~> ./test.pl Benchmark: timing 1500000 iterations of foo_or_0, foo_or_1, foo_or_2, foo_or_3, foo_or_4, foo_qs_0, foo_qs_1, foo_qs_2, foo_qs_3, foo_qs_4... foo_or_0: 2 wallclock secs ( 2.21 usr + 0.00 sys = 2.21 CPU) @ 678733.03/s (n=1500000) foo_or_1: 3 wallclock secs ( 2.13 usr + 0.00 sys = 2.13 CPU) @ 704225.35/s (n=1500000) foo_or_2: 0 wallclock secs ( 0.72 usr + 0.00 sys = 0.72 CPU) @ 2083333.33/s (n=1500000) foo_or_3: -1 wallclock secs ( 0.50 usr + -0.01 sys = 0.49 CPU) @ 3061224.49/s (n=1500000) foo_or_4: 1 wallclock secs ( 1.26 usr + 0.00 sys = 1.26 CPU) @ 1190476.19/s (n=1500000) foo_qs_0: 1 wallclock secs ( 2.07 usr + 0.00 sys = 2.07 CPU) @ 724637.68/s (n=1500000) foo_qs_1: 2 wallclock secs ( 2.02 usr + 0.00 sys = 2.02 CPU) @ 742574.26/s (n=1500000) foo_qs_2: 0 wallclock secs ( 0.66 usr + 0.00 sys = 0.66 CPU) @ 2272727.27/s (n=1500000) foo_qs_3: 2 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU) @ 3061224.49/s (n=1500000) foo_qs_4: 2 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 1442307.69/s (n=1500000) Rate foo_or_0 foo_or_1 foo_qs_0 foo_qs_1 foo_or_4 foo_qs_4 foo_or_2 foo_qs_2 foo_qs_3 foo_or_3 foo_or_0 678733/s -- -4% -6% -9% -43% -53% -67% -70% -78% -78% foo_or_1 704225/s 4% -- -3% -5% -41% -51% -66% -69% -77% -77% foo_qs_0 724638/s 7% 3% -- -2% -39% -50% -65% -68% -76% -76% foo_qs_1 742574/s 9% 5% 2% -- -38% -49% -64% -67% -76% -76% foo_or_4 1190476/s 75% 69% 64% 60% -- -17% -43% -48% -61% -61% foo_qs_4 1442308/s 113% 105% 99% 94% 21% -- -31% -37% -53% -53% foo_or_2 2083333/s 207% 196% 188% 181% 75% 44% -- -8% -32% -32% foo_qs_2 2272727/s 235% 223% 214% 206% 91% 58% 9% -- -26% -26% foo_qs_3 3061224/s 351% 335% 322% 312% 157% 112% 47% 35% -- -0% foo_or_3 3061224/s 351% 335% 322% 312% 157% 112% 47% 35% 0% -- Interesting results. benchmark reports the opposite for "foofoo catbar" (my original string) than UNIX 'time' command did. Am I reading the results correctly? I compared foo_or_N to foo_qs_N and this is what I have: "foofoo catbar": QS wins by -6% "foofoofoo catbar": QS wins by -5% "foo foo cat bar": QS wins by -8% "foo flew over the": OR and QS tie 0% "cufoofoo nest": QS wins by -17% That leaves a question: is there a pattern situation where OR can win?	[reply]