in reply to Re: question about reg exp engine
in thread question about reg exp engine

yes I have a typo in the post. It shoud be ^\s, but that still doesn't give me an answer. Why does it run slower than running all three substitutions on different lines?

Replies are listed 'Best First'.
Re^3: question about reg exp engine
by dave_the_m (Monsignor) on Aug 03, 2008 at 23:17 UTC
    Why does it run slower than running all three substitutions on different lines?
    Because the first three are all optimisable; they are all explicitly anchored to the beginning or end of the string, and the regex engine is smart enough to try the match only at the beginning or end of the string, respectively.

    The combined pattern is too complex to be optimised, so the engine naively tries matching at every position in the (long) string.

    Dave.

Re^3: question about reg exp engine
by broomduster (Priest) on Aug 03, 2008 at 23:55 UTC
    yes I have a typo in the post. It shoud be ^\s, but that still doesn't give me an answer.
    If you fix that typo and then run Benchmarks, I think you will see that they are about the same speed. I see speed differences of 0-3% with the typo fixed, and 20-25% with the typo in place.... probably because some optimization is possible when the regex says "beginning of string" and not '^' in an arbitrary place in the string.

      Show your benchmark. I see about a 75x difference between the single regex and the multiple regex solutions offered by the OP.

      Note that benchmarks are rather like statistics: Lies, damn lies and benchmarks.


      Perl reduces RSI - it saves typing
        I have a feeling I may be about to learn something... ;-)
        use strict; use warnings; use Benchmark qw( cmpthese ); my $results = cmpthese( -10, { 'r3' => sub { my $string = " stuff "; $string =~ s/^\s//g; $string =~ s/\s$//g; $string =~ s/\s+$//g; }, 'r1' => sub { my $string = " stuff "; $string =~ s/^\s|\s$|\s+$//g; }, } );

        Rate r1 r3 r1 295926/s -- -1% r3 299980/s 1% --

        Updated: Now that I see GrandFather's detailed Benchmark below, I see that my error was to use a short string. When I change to
        my $string = (' ' x 1000) . 'x' . (' ' x 1000);
        matching GrandFather's, I get the following:
        Rate r3 r1 r3 61890/s -- -79% r1 297607/s 381% --
        So Lies, damn lies, and benchmarks (with the wrong data), indeed.