Ovid has asked for the wisdom of the Perl Monks concerning the following question:

I was reading the post for Regular Expression Help and it occurred to me that a zero-width positive lookbehind would solve the problem neatly. As it turns out, there are a couple of problems with that solution. The first is that the lookbehind only finds fixed length expressions. If you're not sure of how long the text to match prior to your target text, you're out of luck.

Let's, however, assume for the sake of argument that the problem in Regular Expression Help was the following: need to match the beginning of the string, followed by five all caps and a colon and then we need to substitute out the colon. The following is the closest I could do it:

#!/usr/bin/perl -w use strict; my $test = "ASDFE:asdfe"; $test =~ s! # Substitute (?<= # Zero-width positive lookbehind [A-Z]{5} # Five caps ) # end lookbehind : # Substituting a colon (but not the preceeding + characters) !:</B>\n<BR><BR>!x; print $test;
The problem is that I am not matching the five caps to the beginning of the string. Is this impossible with a lookbehind?

Cheers,
Ovid

Replies are listed 'Best First'.
Re: Yet Another Regex Question
by tye (Sage) on Aug 11, 2000 at 21:29 UTC

    Well, I'd just do this as

    $test =~ s#^([A-Z]{5}):#$1:</B>\n<BR><BR>#;

    But, yes, you can do it with look behinds:

    $test =~ s#(?<!.{6})(?<=[A-Z]{5}):#:</B>\n<BR><BR>#;

    Okay, now I feel dirty. (:

            - tye (but my friends call me "Tye")
      tye, I agree that your first regex would work, but I was specifically looking to see if I could use a lookbehind with a string anchor such as '^'. Your second regex, however, leaves me confused:
      $test =~ s# (?<! # Negative lookbehind .{6} # Six of any non-newline character ) (?<= # Positive lookbehind [A-Z]{5} # Five caps ) : # Here's what we're really substituting #:</B>\n<BR><BR>#;x
      In other words, we're going to substitute out a colon, but only if that is preceded by five caps, but only if the five caps in turn are not preceded by six non-newline characters. Is that what you intended? If so, was that intended as a work-around for the '^' anchor? If it was, I see a lot of problems with it.

      Regardless, I'm stuck with my original question: can regex string anchors and lookbehinds be combined?

      Cheers,
      Ovid

      Update: tye, read my commentary in addition to the regex comments and you'll see that we agree. My wording may have been odd, but it's there. :)

        No, look-behinds are zero-width so the negative look-behind says "there are not 6 (non-newline) characters in front of the colon".

        So we want a colon that has 5 caps in front of it and that doesn't have 6 characters in front of it.

        Okay, I side-stepped your question. I don't know why ^ doesn't work in look-behinds. It might be possible to convince someone that this is a bug. But you can't convince me of that because I refuse to patch regex code. (:

                - tye (but my friends call me "Tye")