Thanks to everyone.

I did my own benchmark based on your suggestions, and found the following:

#!perl -w use strict; use warnings; use Benchmark 'cmpthese'; my $test = join '', map { ' ' x (0 == int rand 9) . ',' . ' ' x ((0 = += int rand 3) * int rand 3) . $_ } 0 .. 1_000; cmpthese(-10,{ 'test1' => sub { my $f = $test; $f =~ s/\s+,\s+|\s+,|,\s+/,/g; }, 'testA' => sub { my $f = $test; $f =~ s/\s+,\s+/,/g; $f =~ s/\s+,/,/ +g; $f =~ s/,\s+/,/g; }, 'test2' => sub { my $f = $test; $f =~ s/\s+,\s*|,\s+/,/g; }, 'testB' => sub { my $f = $test; $f =~ s/\s+,\s*/,/g; $f =~ s/,\s+/,/ +g; }, 'test3' => sub { my $f = $test; $f =~ s/\s*,\s*/,/g; }, });

Results using ActivePerl 5.10.0 on WinXP-SP3:

1_000: Rate test1 test2 test3 testA testB test1 974/s -- -22% -56% -66% -71% test2 1256/s 29% -- -43% -56% -63% test3 2221/s 128% 77% -- -22% -35% testA 2852/s 193% 127% 28% -- -16% testB 3412/s 250% 172% 54% 20% -- 10_000: Rate test1 test2 test3 testA testB test1 79/s -- -24% -62% -72% -77% test2 103/s 31% -- -50% -63% -70% test3 205/s 161% 99% -- -26% -39% testA 277/s 252% 169% 35% -- -18% testB 339/s 331% 228% 65% 22% -- 100_000: Rate test1 test2 test3 testA testB test1 6.5/s -- -26% -67% -76% -81% test2 8.9/s 36% -- -55% -68% -74% test3 19.6/s 200% 122% -- -28% -42% testA 27.3/s 319% 209% 39% -- -19% testB 33.6/s 414% 279% 71% 23% --

Results are listed from slower to faster regexp(s).

Tests 1 and 3 are regexps from my original question, and the intermediate 2's regexp was supplied by ig. A and B are 1 and 2 respectively, but splitted on alternations (replace in phases).

Possible quick conclusions:

  1. It's better to use backtracking than alternations.
  2. It's better to split a regexp with an alternation whenever is possible.
  3. Itīs better to replace a pattern by the same thing instead of doing some effort to avoid that.

Comments?


In reply to Re: Comparing Regular Expressions by vitoco
in thread Comparing Regular Expressions by vitoco

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.