Re: Re: Re: Variable-width negative lookbehind

Replies are listed 'Best First'.
Regexp::Keep (was: Variable-width negative lookbehind) by Roy Johnson (Monsignor) on May 07, 2004 at 15:17 UTC
I don't want to run down Regexp::Keep, because I think it's nifty. But I do want to point out a few things: It doesn't provide any help with variable-width negative lookbehind (which the OP thought was needed) Depending on the regexp, it can be faster or slower than the two-regexp method. It's slightly broken (see below) Here's some benchmarking code: use Benchmark 'cmpthese'; use strict; use Regexp::Keep; my @strings = ('one two three four', 'two three four five', 'one three + five'); my @copy; sub replace { s/(o+ )three/$1/ for @copy=@strings } sub keep { s/o+ \Kthree// for @copy=@strings } sub two { /o+ (?=three)/g and s/\Gthree// for @copy=@strings } cmpthese(-3, { 'replace' => \&replace, 'keep' => \&keep, 'two' => \&two }); replace; print "Replace: ", join("\n", @copy), "\n"; keep; print "Keep: ", join("\n", @copy), "\n"; two; print "Two: ", join("\n", @copy), "\n"; [download] As written, I got these results: `Rate replace keep two replace 4739/s -- -15% -26% keep 5587/s 18% -- -13% two 6386/s 35% 14% -- Replace: one two four two four five one three five Keep: one two four two four five one three five Two: one two four two four five one three five` [download] Note that keep yields different output: it doesn't keep the space. If I move the \K to follow the "t" in "three" (or anywhere in "three" -- even at the end of the pattern!), it doesn't affect the output. If I change the o+ to an e+, keep wins by a hair. If I change it to an r (so that nothing matches or is substituted), replace wins and two loses. The lookahead is necessary for non-global replacement. For global, the subs can look like this: `sub replace { s/(o+ )three/$1/g for @copy=@strings } sub keep { s/o+ \Kthree//g for @copy=@strings } sub two { do {s/\Gthree// while /o+ /g} for @copy=@strings }` [download] and, in addition to being much neater, keep wins by a hair. Update:Interestingly, this one-match alternative is significantly slower -- about a third slower than replace. `sub two { /o+ (?=(three))/g and substr($_, $-[1], $+[1]-$-[1], '') + for @copy=@strings` [download] The PerlMonk `tr///` Advocate	[reply] [d/l] [select]
Re: Regexp::Keep (was: Variable-width negative lookbehind) by japhy (Canon) on May 07, 2004 at 17:25 UTC
What my module does is change where `PL_regstartp[0]` is, and it does it based on `PL_reginput`. `PL_reginput` is set too late when a regex like `/ab\Kc/` is used -- it's set after the 'a' is matched, but not after the 'b' is matched, because of the EVAL node right after it, so we're using an old value. I've just fixed the module to get around this. Instead of `/ab\Kc/` becoming `/ab(?{Regexp::Keep::KEEP})c/`, it becomes `/ab.{0}(?{Regexp::Keep::KEEP})c/`. This forces `PL_reginput` to be updated before the EVAL is entered. _____________________________________________________ Jeff`[japhy]`Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area) `s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;`	[reply] [d/l] [select]