in reply to Re^2: regex negativ lookahead
in thread regex negativ lookahead

negativ lookahead did not work as expected. let's focus on:
("gugus" =~ /gu(?!gu)/) ==> <gugus> found ("gu", "gu", "s")
and
("gugis" =~ /gu(?!gi)/) ==> <gugis> nomatch (undef, undef, undef)
do not solve the same.

Look at it this way:

  1. "gugus" =~ /gu(?!gu)/

    1. Where does /gu/ match in "gugus"?
      "gugus" "gugus" ^^ here and ^^ here
    2. What is the string following the match?
      "gugus" "gugus" ^^ "gus" and ^^ "s"
    3. Does that string match /^gu/ (negative lookahead)?
      "gugus" "gugus" ^^ "gus" yes! ^^ "s" no! => match fails, keep looking => match succeeds
    4. Overall, the match succeeds at "gugus"

      <update2> I realized the above might be a little misleading: The regex engine always works from left to right, so it does not execute the steps in the order I've shown here. It first fully inspects the leftmost match before continuing on to find the second "gu" in the string. </update2>

  2. "gugis" =~ /gu(?!gi)/

    1. Where does /gu/ match in "gugis"?
      "gugis" ^^ here
    2. What is the string following the match?
      "gugis" ^^ "gis"
    3. Does that string match /^gi/ (negative lookahead)?
      "gugis" ^^ "gis" yes! => match fails
    4. Overall, the match fails.
  3. "start" =~ /(?!start)/

    1. Where does // match in "start"? (whitespace added to show the "zero-length strings" around each character, as in ""."s".""."t".""."a".""."r".""."t"."")
      v v v v v v (everywhere) " s t a r t "
    2. What is the string following the match?
      "start" | "tart" | | "art" | | | "rt" | | | | "t" | | | | | "" v v v v v v " s t a r t "
    3. Does that string match /^start/ (negative lookahead)?
      "start" yes => fails | "tart" no => succeeds | | "art" no => succeeds | | | "rt" no => succeeds | | | | "t" no => succeeds | | | | | "" no => succeeds v v v v v v " s t a r t "
    4. Since the regex engine goes from left to right, and stops at the first place where it succeeds, overall, the match succeeds at " s t a r t ".
    5. Side note: If you make the regex engine continue matching where it last left off with /g, you can see the whole thing in action:

      $ perl -wMstrict -MData::Dump while ( "start" =~ /(?!start)/pg ) { dd ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}; } __END__ ("s", "", "tart") ("st", "", "art") ("sta", "", "rt") ("star", "", "t") ("start", "", "")

If you could give some more complete examples of actual things you're trying to match and the actual regexes, that would probably help.

Minor edits for clarification.