Re: Re: (bbfu) (dot star) Re: Extract potentially quoted words

I would say that using a negated character class is more efficient than using minimal matching:

                Rate   minimal_c neg_class_c     minimal   neg_class
minimal_c    94887/s          --        -11%        -40%        -44%
neg_class_c 106974/s         13%          --        -32%        -37%
minimal     157452/s         66%         47%          --         -8%
neg_class   170558/s         80%         59%          8%          --
[download]

minimal_c and neg_class_c use capturing parentheses; minimal and neg_class don't. Either way there's a small but noticeable advantage for the negated character class.

The reason for this isn't too hard to figure. With a negated character-class, the regex engine does almost no backtracking. The first thing it tries to match is [^"]+, and it succeeds every time until it finds the ".

With minimal matching, however, the engine backtracks after every character. The first thing it tries to match is ", and it fails every time, then backs up and tries matching .+?, until it finds the ". It's doing more work that way, so it's slower.

#!perl -w

use strict;

use Benchmark;

Benchmark->import(qw/cmpthese/) if $^V;

my $time = shift || 10;
my $len  = shift || 1000;

my $abc = 'abc' x $len;

my $str = '$abc"$abc"$abc';

my %bms = (
           minimal     => sub { $str =~ /".*?"/ },
           neg_class   => sub { $str =~ /"[^\"]*"/ },
           minimal_c   => sub { $str =~ /"(.*?)"/ },
           neg_class_c => sub { $str =~ /"([^\"]*)"/ },
          );

if ($^V) {
  cmpthese(-$time, \%bms);
} else {
  timethese(-$time, \%bms);
}
[download]

Comment on Re: Re: (bbfu) (dot star) Re: Extract potentially quoted words Select or Download Code

Replies are listed 'Best First'.
Re: Re: Re: (bbfu) (dot star) Re: Extract potentially quoted words by merlyn (Sage) on Jun 07, 2001 at 20:51 UTC
With minimal matching, however, the engine backtracks after every character. The first thing it tries to match is ", and it fails every time, then backs up and tries matching .+?, until it finds the ". It's doing more work that way, so it's slower. No, perhaps you are confusing .+ with .+?. With .+?, it's inching forward a character at a time each time it can't find a " immediately there. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Re: Re: (bbfu) (dot star) Re: Extract potentially quoted words by chipmunk (Parson) on Jun 07, 2001 at 21:18 UTC
With .+?, it's inching forward a character at a time each time it can't find a " immediately there. Is that not what I said? With /.+?"/, first the regex engine tries to match a "; then it backtracks and extends .+? . It has to backtrack for every non-quote character before the ", whereas /[^"]"/ only backtracks once, when it gets to the ".	[reply]