in reply to Re: Variable-width negative lookbehind
in thread Variable-width negative lookbehind

Regarding #3:
While the example you give is easily translated into a (non-variable-width) lookbehind ( s/(?<=[^Y])X+//), the more general case of your
s/$regex1\K$regex2//;
can be achieved with
/$regex1/g and s/\G$regex2//;
(except for side effects of capturing parentheses).

The PerlMonk tr/// Advocate

Replies are listed 'Best First'.
Re: Re: Re: Variable-width negative lookbehind
by japhy (Canon) on May 06, 2004 at 23:26 UTC
    Well, yes, the constant-width part is a big requirement for using a lookbehind, which is why I devised this method. And I'd expect one regex to be faster than two. But you also bring up the capturing parentheses, which are also an advantage of the one-regex method.
    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
      I don't want to run down Regexp::Keep, because I think it's nifty. But I do want to point out a few things:
      1. It doesn't provide any help with variable-width negative lookbehind (which the OP thought was needed)
      2. Depending on the regexp, it can be faster or slower than the two-regexp method.
      3. It's slightly broken (see below)
      Here's some benchmarking code:
      use Benchmark 'cmpthese'; use strict; use Regexp::Keep; my @strings = ('one two three four', 'two three four five', 'one three + five'); my @copy; sub replace { s/(o+ )three/$1/ for @copy=@strings } sub keep { s/o+ \Kthree// for @copy=@strings } sub two { /o+ (?=three)/g and s/\Gthree// for @copy=@strings } cmpthese(-3, { 'replace' => \&replace, 'keep' => \&keep, 'two' => \&two }); replace; print "Replace: ", join("\n", @copy), "\n"; keep; print "Keep: ", join("\n", @copy), "\n"; two; print "Two: ", join("\n", @copy), "\n";
      As written, I got these results:
      Rate replace keep two replace 4739/s -- -15% -26% keep 5587/s 18% -- -13% two 6386/s 35% 14% -- Replace: one two four two four five one three five Keep: one two four two four five one three five Two: one two four two four five one three five
      Note that keep yields different output: it doesn't keep the space. If I move the \K to follow the "t" in "three" (or anywhere in "three" -- even at the end of the pattern!), it doesn't affect the output.

      If I change the o+ to an e+, keep wins by a hair. If I change it to an r (so that nothing matches or is substituted), replace wins and two loses.

      The lookahead is necessary for non-global replacement. For global, the subs can look like this:

      sub replace { s/(o+ )three/$1/g for @copy=@strings } sub keep { s/o+ \Kthree//g for @copy=@strings } sub two { do {s/\Gthree// while /o+ /g} for @copy=@strings }
      and, in addition to being much neater, keep wins by a hair.

      Update:Interestingly, this one-match alternative is significantly slower -- about a third slower than replace.

      sub two { /o+ (?=(three))/g and substr($_, $-[1], $+[1]-$-[1], '') + for @copy=@strings

      The PerlMonk tr/// Advocate
        What my module does is change where PL_regstartp[0] is, and it does it based on PL_reginput. PL_reginput is set too late when a regex like /ab\Kc/ is used -- it's set after the 'a' is matched, but not after the 'b' is matched, because of the EVAL node right after it, so we're using an old value.

        I've just fixed the module to get around this. Instead of /ab\Kc/ becoming /ab(?{Regexp::Keep::KEEP})c/, it becomes /ab.{0}(?{Regexp::Keep::KEEP})c/. This forces PL_reginput to be updated before the EVAL is entered.

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;