in reply to Re: Re: Re: Variable-width negative lookbehind
in thread Variable-width negative lookbehind

I don't want to run down Regexp::Keep, because I think it's nifty. But I do want to point out a few things:
  1. It doesn't provide any help with variable-width negative lookbehind (which the OP thought was needed)
  2. Depending on the regexp, it can be faster or slower than the two-regexp method.
  3. It's slightly broken (see below)
Here's some benchmarking code:
use Benchmark 'cmpthese'; use strict; use Regexp::Keep; my @strings = ('one two three four', 'two three four five', 'one three + five'); my @copy; sub replace { s/(o+ )three/$1/ for @copy=@strings } sub keep { s/o+ \Kthree// for @copy=@strings } sub two { /o+ (?=three)/g and s/\Gthree// for @copy=@strings } cmpthese(-3, { 'replace' => \&replace, 'keep' => \&keep, 'two' => \&two }); replace; print "Replace: ", join("\n", @copy), "\n"; keep; print "Keep: ", join("\n", @copy), "\n"; two; print "Two: ", join("\n", @copy), "\n";
As written, I got these results:
Rate replace keep two replace 4739/s -- -15% -26% keep 5587/s 18% -- -13% two 6386/s 35% 14% -- Replace: one two four two four five one three five Keep: one two four two four five one three five Two: one two four two four five one three five
Note that keep yields different output: it doesn't keep the space. If I move the \K to follow the "t" in "three" (or anywhere in "three" -- even at the end of the pattern!), it doesn't affect the output.

If I change the o+ to an e+, keep wins by a hair. If I change it to an r (so that nothing matches or is substituted), replace wins and two loses.

The lookahead is necessary for non-global replacement. For global, the subs can look like this:

sub replace { s/(o+ )three/$1/g for @copy=@strings } sub keep { s/o+ \Kthree//g for @copy=@strings } sub two { do {s/\Gthree// while /o+ /g} for @copy=@strings }
and, in addition to being much neater, keep wins by a hair.

Update:Interestingly, this one-match alternative is significantly slower -- about a third slower than replace.

sub two { /o+ (?=(three))/g and substr($_, $-[1], $+[1]-$-[1], '') + for @copy=@strings

The PerlMonk tr/// Advocate

Replies are listed 'Best First'.
Re: Regexp::Keep (was: Variable-width negative lookbehind)
by japhy (Canon) on May 07, 2004 at 17:25 UTC
    What my module does is change where PL_regstartp[0] is, and it does it based on PL_reginput. PL_reginput is set too late when a regex like /ab\Kc/ is used -- it's set after the 'a' is matched, but not after the 'b' is matched, because of the EVAL node right after it, so we're using an old value.

    I've just fixed the module to get around this. Instead of /ab\Kc/ becoming /ab(?{Regexp::Keep::KEEP})c/, it becomes /ab.{0}(?{Regexp::Keep::KEEP})c/. This forces PL_reginput to be updated before the EVAL is entered.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;