in reply to Re: Replace zero-width grouping?
in thread Replace zero-width grouping?

I hoped to fix the two-pass version by adding a check to make sure this character wasn't already earmarked to become a "B" before accepting it to become an "A":

s/(?<!A...)(.)(?=...\1)/A/g; s/(?<=A...)(.)/B/g;
but that still fails in exactly the same way, because the regexp engine goes to some effort to make sure that the pattern is matched against the original unmodified string each time through an s///g.

It is also unfortunate that s/// doesn't have the same support for things the //gc flags - it would have been handy to be able to solve this with something like:

pos($_) -= 3 while s/(.)(...)\1/A$2B/gc;

Hmm, frustrating - I feel sure there must be a simple and efficient solution.

Hugo

Replies are listed 'Best First'.
Re: Re: Re: Replace zero-width grouping?
by Enlil (Parson) on May 07, 2003 at 08:14 UTC
    I don't know exactly if this is the type of solution you and BrowserUK were looking for (working towards), but it seems to work.. and should cut back on the rescanning thing. I left the (?{print pos($_)."\n"}) (it can be taken out if it is to much of an eyesore), to show where it starts matching, and uncommenting the use re 'debug'; line seems to confirm this. Anyhow here is the code:
    use strict; use warnings; #use re 'debug'; $_ = "17341234173412341734123417341234"; my $i; pos() = $i while s<\G (?{print pos($_) . "\n"}) (.*?)(.)(...)\2 (?{$i=pos()-4}) > <defined $1?$1."A".$3."B":"A".$3."B">ex; print; __END__ 0 1 3 4 9 11 12 17 19 20 25 27 28 A7AAB2BBA7AAB2BBA7AAB2BBA7AAB2BB
    update: FWIW I did do much benchmarking using cmpthese in the benchmark module, and it appears that after all the hand waving, and tweaks, nothing seems to run faster than BrowserUK's substr solution, and probably the one I would go with for long strings of digits. But I would make one last change tweak first. That being changing the first . to a \d .
    That is:
    substr($a, $_, 5) =~ s[(\d)(...)\1][A$2B] for 0 .. length ($a);
    Which should speed up the work the regex engine has to do as it will skip all the positions that do not start with a digit (and hence have already been turned into a letter (: ) For that matter the original could have been made better doing the same:
    s!(\d)(...)\1!A$2B!;

    -enlil

Re: Re: Re: Replace zero-width grouping?
by BrowserUk (Patriarch) on May 07, 2003 at 02:24 UTC

    Yes. I kept feeling there was a solution there somewhere, but be darned if I could find it.

    I tried playing games with \G as part of a positive look-behind assertion, but as is documented in the 5.8 copy of perlre, this isn't really supported. Constraining the match with a lvalue substr was the best I could come up with.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller