in reply to Re: Regular Expressions
in thread Regular Expressions \b and \B
A good description, tachyon++ - you are correct that the lookarounds offer more flexibility but let me point out the differences in terms of benchmarks:
#!/usr/bin/perl use Benchmark qw/cmpthese/; $defaulttext = q/foo / x 30; # $defaulttext = q/foobar / x 30; cmpthese( 100_000, { slash_b => q{$text=$defaulttext; $text =~ s/\bfoo\b//g;}, neg_look=> q{$text=$defaulttext; $text =~ s/(?<!\w)foo(?!\w)//g;}, pos_look=> q{$text=$defaulttext; $text =~ s/(?<=[^\w])foo(?=[^\w]) +//g;}, });
With $defaulttext being 'foo foo ...' all three methods take approx. the same time, the changing of $text takes a decisive amount of time.
With $defaulttext being 'foobar foobar ...' - i.e. no replacements are done - I get the following results:
Rate pos_look neg_look slash_b pos_look 27894/s -- -8% -35% neg_look 30441/s 9% -- -29% slash_b 42662/s 53% 40% --
This shows that the \b variant is about 50% quicker and the negative lookaround is better than the negated character class.
But the most important difference can be seen from the following code
$text= q/foo bar foo/; ($tmp = $text) =~ s/\bfoo\b//g; print $tmp,"\n"; ($tmp = $text) =~ s/(?<!\w)foo(?!\w)//g; print $tmp,"\n"; ($tmp = $text) =~ s/(?<=[^\w])foo(?=[^\w])//g; print $tmp,"\n"; # which prints: bar bar foo bar foo
The positive lookaround does not behave like the others at the boundaries of the string. This is because the positive lookaround looks for a character (class) but - as there is no character before the beginning of the string or after the end - it fails. The negative lookaround works even if no character is there.
-- Hofmator
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: Regular Expressions
by tachyon (Chancellor) on Jun 27, 2001 at 16:03 UTC |