davidrw has asked for the wisdom of the Perl Monks concerning the following question:

The following are functionally identical (remove all x's from a string), but what is the technical difference (in efficiency/speed; in how the regex engine deals with it) between these two, if any?
s/x//g; s/x+//g;
I've wondered this in the past, and an answer of mine w/a follow-up by OP in this thread prompted me to post this...

Replies are listed 'Best First'.
Re: regex internals: quantifiers vs global match
by samtregar (Abbot) on May 16, 2005 at 04:07 UTC
    As with most performance questions the best way to find out is to Benchmark it:

    use Benchmark qw(cmpthese); cmpthese(-5, { NoPlus => sub { $_ = 'a' . ('x' x 1000) . 'b'; s/x//g } +, Plus => sub { $_ = 'a' . ('x' x 1000) . 'b'; s/x+//g +} });

    Produces these results:

    $ perl subs.pl Rate NoPlus Plus NoPlus 2674/s -- -98% Plus 115762/s 4230% --

    So it would appear that using + for your iteration is 42 times faster than using s///g! Of course the exact difference will depend on the string being matched. When I changed the number of x's to 100 the difference dropped to 12 times. When I changed it to produce a string of 'axaxax...' there was almost no difference between the two solutions.

    -sam

Re: regex internals: quantifiers vs global match
by kvale (Monsignor) on May 16, 2005 at 03:11 UTC
    The second form s/x+//g will generally be faster because adjacent x's in the string will be matched in a fast inner loop in the regex bytecode execution code, vs. traversing the whole regex execution call tree for each x.

    -Mark

Re: regex internals: quantifiers vs global match
by K_M_McMahon (Hermit) on May 16, 2005 at 04:08 UTC
    UPDATE: samtregar beat me to the punch. But, based on your question, I think a random string containing X's and non-x characters is a more valid test for benchmarking....Just my humble opinion.

    Your question was interesting so I ran a quick benchmark..... Both on my linux box and a windows machine. both showed that the regex with the plus was a little faster. For the reasons pointed out by kvale.
    Benchmark code:
    #!/usr/bin/perl use strict; use warnings; use Benchmark; $| = 1; my $clean="kljansdljzxnkxnlkxjnxlkjnxlkxxxlkjnlkjnlkxjnkxjnlkxjnlxkjnl +kxxluhasiuuhxkxhkxhxxxxlkjsndifnhskxhkxhx"; timethese ( 10000000, { 'with plus' => sub { &plus}, 'without plus' => sub { &no_plus }, } ); sub no_plus { my $temp=$clean; $temp=~s/x//g; } sub plus { my $temp=$clean; $temp=~s/x+//g; }
    Results (from windows)
    C:\Documents and Settings\Kevin\My Documents\scripts>test.pl Benchmark: timing 10000000 iterations of with plus, without plus... with plus: 75 wallclock secs (72.48 usr + 0.01 sys = 72.49 CPU) @ 13 +7959.58/s (n=10000000) without plus: 86 wallclock secs (84.41 usr + 0.01 sys = 84.42 CPU) @ +118453.94/s (n=10000000)
    Results (from Linux)
    kevin@kevin-linux:~> perl test.pl Benchmark: timing 10000000 iterations of with plus, without plus... with plus: 124 wallclock secs (123.54 usr + 0.00 sys = 123.54 CPU) @ + 80945.44/s (n=10000000) without plus: 146 wallclock secs (146.38 usr + 0.00 sys = 146.38 CPU) + @ 68315.34/s (n=10000000)


    -Kevin
    my $a='62696c6c77667269656e6440676d61696c2e636f6d'; while ($a=~m/(^.{2})/s) {print unpack('A',pack('H*',"$1"));$a=~s/^.{2}//s;}
Re: regex internals: quantifiers vs global match
by Animator (Hermit) on May 16, 2005 at 11:20 UTC

    It really depends on your input...

    Anoter alternative for your regex would be: tr/x//d; (but ofcourse this only works when you want to remove some charachters and not a word).

    What I noticed during benchmarking is that the less x's follows eachother the faster tr/// gets (or the slower s/// gets). Below you find the benchmarks I ran:

    String: 'a' . ('x' x 1000) . 'b':

    Rate NoPlus Tr Plus NoPlus 2465/s -- -97% -98% Tr 94056/s 3715% -- -15% Plus 111285/s 4414% 18% --

    String: 'a' . ('x' x 100) . 'b':

    Rate NoPlus Plus Tr NoPlus 22929/s -- -91% -93% Plus 245996/s 973% -- -25% Tr 328951/s 1335% 34% --

    String: 'ax' x 1000:

    Rate NoPlus Plus Tr NoPlus 2224/s -- -6% -95% Plus 2367/s 6% -- -95% Tr 49416/s 2122% 1988% --

    String: 'ax' x 100:

    Rate NoPlus Plus Tr NoPlus 21151/s -- -4% -93% Plus 22038/s 4% -- -93% Tr 302771/s 1331% 1274% --

    Based on these results I would most likely use tr/x//d; ...

Re: regex internals: quantifiers vs global match
by DrHyde (Prior) on May 16, 2005 at 09:00 UTC
    If you're interested in how regex engines work, read Friedl's excellent book "Mastering Regular Expressions".
      Excellent, but terse!