Re: Re: Regular Expressions

A good description, tachyon++ - you are correct that the lookarounds offer more flexibility but let me point out the differences in terms of benchmarks:

#!/usr/bin/perl
use Benchmark qw/cmpthese/;

$defaulttext = q/foo / x 30;
# $defaulttext = q/foobar / x 30;

cmpthese( 100_000, {
    slash_b => q{$text=$defaulttext; $text =~ s/\bfoo\b//g;},
    neg_look=> q{$text=$defaulttext; $text =~ s/(?<!\w)foo(?!\w)//g;},
    pos_look=> q{$text=$defaulttext; $text =~ s/(?<=[^\w])foo(?=[^\w])
+//g;},
});
[download]

With $defaulttext being 'foo foo ...' all three methods take approx. the same time, the changing of $text takes a decisive amount of time.

With $defaulttext being 'foobar foobar ...' - i.e. no replacements are done - I get the following results:

          Rate    pos_look  neg_look  slash_b
pos_look 27894/s     --        -8%     -35%
neg_look 30441/s     9%         --     -29%
slash_b  42662/s    53%        40%      --
[download]

This shows that the \b variant is about 50% quicker and the negative lookaround is better than the negated character class.

But the most important difference can be seen from the following code

$text= q/foo bar foo/;
($tmp = $text) =~ s/\bfoo\b//g;
print $tmp,"\n";

($tmp = $text) =~ s/(?<!\w)foo(?!\w)//g;
print $tmp,"\n";

($tmp = $text) =~ s/(?<=[^\w])foo(?=[^\w])//g;
print $tmp,"\n";

# which prints:
 bar
 bar
foo bar foo
[download]

The positive lookaround does not behave like the others at the boundaries of the string. This is because the positive lookaround looks for a character (class) but - as there is no character before the beginning of the string or after the end - it fails. The negative lookaround works even if no character is there.

-- Hofmator

Comment on Re: Re: Regular Expressions Select or Download Code

Replies are listed 'Best First'.
Re: Re: Re: Regular Expressions by tachyon (Chancellor) on Jun 27, 2001 at 16:03 UTC
Good points, you might have noticed that I carefully used the word 'similar' rather than 'same'. As you point out there are differences both in speed and what matches where. My grasp of regexes continues to grow thanks in large part to posts like these ++ cheers tachyon	[reply]