in reply to Re: Re: Variable-width negative lookbehind
in thread Variable-width negative lookbehind

Well, yes, the constant-width part is a big requirement for using a lookbehind, which is why I devised this method. And I'd expect one regex to be faster than two. But you also bring up the capturing parentheses, which are also an advantage of the one-regex method.
_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
  • Comment on Re: Re: Re: Variable-width negative lookbehind

Replies are listed 'Best First'.
Regexp::Keep (was: Variable-width negative lookbehind)
by Roy Johnson (Monsignor) on May 07, 2004 at 15:17 UTC
    I don't want to run down Regexp::Keep, because I think it's nifty. But I do want to point out a few things:
    1. It doesn't provide any help with variable-width negative lookbehind (which the OP thought was needed)
    2. Depending on the regexp, it can be faster or slower than the two-regexp method.
    3. It's slightly broken (see below)
    Here's some benchmarking code:
    use Benchmark 'cmpthese'; use strict; use Regexp::Keep; my @strings = ('one two three four', 'two three four five', 'one three + five'); my @copy; sub replace { s/(o+ )three/$1/ for @copy=@strings } sub keep { s/o+ \Kthree// for @copy=@strings } sub two { /o+ (?=three)/g and s/\Gthree// for @copy=@strings } cmpthese(-3, { 'replace' => \&replace, 'keep' => \&keep, 'two' => \&two }); replace; print "Replace: ", join("\n", @copy), "\n"; keep; print "Keep: ", join("\n", @copy), "\n"; two; print "Two: ", join("\n", @copy), "\n";
    As written, I got these results:
    Rate replace keep two replace 4739/s -- -15% -26% keep 5587/s 18% -- -13% two 6386/s 35% 14% -- Replace: one two four two four five one three five Keep: one two four two four five one three five Two: one two four two four five one three five
    Note that keep yields different output: it doesn't keep the space. If I move the \K to follow the "t" in "three" (or anywhere in "three" -- even at the end of the pattern!), it doesn't affect the output.

    If I change the o+ to an e+, keep wins by a hair. If I change it to an r (so that nothing matches or is substituted), replace wins and two loses.

    The lookahead is necessary for non-global replacement. For global, the subs can look like this:

    sub replace { s/(o+ )three/$1/g for @copy=@strings } sub keep { s/o+ \Kthree//g for @copy=@strings } sub two { do {s/\Gthree// while /o+ /g} for @copy=@strings }
    and, in addition to being much neater, keep wins by a hair.

    Update:Interestingly, this one-match alternative is significantly slower -- about a third slower than replace.

    sub two { /o+ (?=(three))/g and substr($_, $-[1], $+[1]-$-[1], '') + for @copy=@strings

    The PerlMonk tr/// Advocate
      What my module does is change where PL_regstartp[0] is, and it does it based on PL_reginput. PL_reginput is set too late when a regex like /ab\Kc/ is used -- it's set after the 'a' is matched, but not after the 'b' is matched, because of the EVAL node right after it, so we're using an old value.

      I've just fixed the module to get around this. Instead of /ab\Kc/ becoming /ab(?{Regexp::Keep::KEEP})c/, it becomes /ab.{0}(?{Regexp::Keep::KEEP})c/. This forces PL_reginput to be updated before the EVAL is entered.

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;