in reply to Re: (bbfu) (dot star) Re: Extract potentially quoted words
in thread Extract potentially quoted words
minimal_c and neg_class_c use capturing parentheses; minimal and neg_class don't. Either way there's a small but noticeable advantage for the negated character class.Rate minimal_c neg_class_c minimal neg_class minimal_c 94887/s -- -11% -40% -44% neg_class_c 106974/s 13% -- -32% -37% minimal 157452/s 66% 47% -- -8% neg_class 170558/s 80% 59% 8% --
The reason for this isn't too hard to figure. With a negated character-class, the regex engine does almost no backtracking. The first thing it tries to match is [^"]+, and it succeeds every time until it finds the ".
With minimal matching, however, the engine backtracks after every character. The first thing it tries to match is ", and it fails every time, then backs up and tries matching .+?, until it finds the ". It's doing more work that way, so it's slower.
#!perl -w use strict; use Benchmark; Benchmark->import(qw/cmpthese/) if $^V; my $time = shift || 10; my $len = shift || 1000; my $abc = 'abc' x $len; my $str = '$abc"$abc"$abc'; my %bms = ( minimal => sub { $str =~ /".*?"/ }, neg_class => sub { $str =~ /"[^\"]*"/ }, minimal_c => sub { $str =~ /"(.*?)"/ }, neg_class_c => sub { $str =~ /"([^\"]*)"/ }, ); if ($^V) { cmpthese(-$time, \%bms); } else { timethese(-$time, \%bms); }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: (bbfu) (dot star) Re: Extract potentially quoted words
by merlyn (Sage) on Jun 07, 2001 at 20:51 UTC | |
by chipmunk (Parson) on Jun 07, 2001 at 21:18 UTC |