in reply to Variable-width negative lookbehind

Here are a handful of ways:
  1. Break the look-behind into two look-behinds. Make sure it's not preceded by a Y, and then make sure it's not preceded by the beginning of the string:
    $string =~ s/(?<!Y)(?<!^)X//g;
  2. Reverse the string and reverse the sense of the regex. Match an X not followed by an optional Y and then the end of the string:
    my $rstr = reverse $string; $rstr =~ s/X(?!Y|\Z)//g; # note \Z, not $ $string = reverse $rstr;
  3. Use my Regexp::Keep module (which I hope can be refactored to a standard regex assertion). It provides an "anchor", \K, which saves you from having to replace what you've matched with what you've matched. You'll see the difference here:
    # old: # $string =~ s/([^Y])X/$1/g; # new: $string =~ s/[^Y]\KX//;
Regexp::Keep's \K anchor basically resets where Perl thinks it has started matching. See its documentation for more explanation.
_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re: Re: Variable-width negative lookbehind
by Roy Johnson (Monsignor) on May 06, 2004 at 19:36 UTC
    Regarding #3:
    While the example you give is easily translated into a (non-variable-width) lookbehind ( s/(?<=[^Y])X+//), the more general case of your
    s/$regex1\K$regex2//;
    can be achieved with
    /$regex1/g and s/\G$regex2//;
    (except for side effects of capturing parentheses).

    The PerlMonk tr/// Advocate
      Well, yes, the constant-width part is a big requirement for using a lookbehind, which is why I devised this method. And I'd expect one regex to be faster than two. But you also bring up the capturing parentheses, which are also an advantage of the one-regex method.
      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
        I don't want to run down Regexp::Keep, because I think it's nifty. But I do want to point out a few things:
        1. It doesn't provide any help with variable-width negative lookbehind (which the OP thought was needed)
        2. Depending on the regexp, it can be faster or slower than the two-regexp method.
        3. It's slightly broken (see below)
        Here's some benchmarking code:
        use Benchmark 'cmpthese'; use strict; use Regexp::Keep; my @strings = ('one two three four', 'two three four five', 'one three + five'); my @copy; sub replace { s/(o+ )three/$1/ for @copy=@strings } sub keep { s/o+ \Kthree// for @copy=@strings } sub two { /o+ (?=three)/g and s/\Gthree// for @copy=@strings } cmpthese(-3, { 'replace' => \&replace, 'keep' => \&keep, 'two' => \&two }); replace; print "Replace: ", join("\n", @copy), "\n"; keep; print "Keep: ", join("\n", @copy), "\n"; two; print "Two: ", join("\n", @copy), "\n";
        As written, I got these results:
        Rate replace keep two replace 4739/s -- -15% -26% keep 5587/s 18% -- -13% two 6386/s 35% 14% -- Replace: one two four two four five one three five Keep: one two four two four five one three five Two: one two four two four five one three five
        Note that keep yields different output: it doesn't keep the space. If I move the \K to follow the "t" in "three" (or anywhere in "three" -- even at the end of the pattern!), it doesn't affect the output.

        If I change the o+ to an e+, keep wins by a hair. If I change it to an r (so that nothing matches or is substituted), replace wins and two loses.

        The lookahead is necessary for non-global replacement. For global, the subs can look like this:

        sub replace { s/(o+ )three/$1/g for @copy=@strings } sub keep { s/o+ \Kthree//g for @copy=@strings } sub two { do {s/\Gthree// while /o+ /g} for @copy=@strings }
        and, in addition to being much neater, keep wins by a hair.

        Update:Interestingly, this one-match alternative is significantly slower -- about a third slower than replace.

        sub two { /o+ (?=(three))/g and substr($_, $-[1], $+[1]-$-[1], '') + for @copy=@strings

        The PerlMonk tr/// Advocate