in reply to Re: regex negativ lookahead
in thread regex negativ lookahead

thank you for helping answer.

well why not using !~ ? because it's a configuration value, accepting regexp and a do know the pattern for excluding values from a array. my "real problem script" does work with real data.

by now i first try to understand perl functions .. perlref Lookaround Assertions as you already noted. first of all any tests with positiv lookahaed ($var =~ /a(?=string)/) work as expected.

negativ lookahead did not work as expected. let's focus on:
("gugus" =~ /gu(?!gu)/) ==> <gugus> found ("gu", "gu", "s")
and
("gugis" =~ /gu(?!gi)/) ==> <gugis> nomatch (undef, undef, undef)
do not solve the same. was using your proposal to dump match array. great idea, thank you

use strict; use warnings; use Data::Dump; my @var1 = ('gugu', 'gugus', 'and gugut for','gugas and', 'gurit for', +'granit') ; my @var2 = ('gugi', 'gugis', 'and gugit for','gugas and', 'gurit for', +'granit') ; my ($ret, $match); print ("get any string 'gu' not followed by 'gu'\n"); foreach (@var1) { $ret = $_ =~ /gu(?!gu)/p; $match = $ret eq 1 ? 'found' : 'nomatch'; print "<$_> $match "; dd ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}; } print ("\nget any string 'gu' followed by 'gi'\n"); foreach (@var2) { $ret = $_ =~ /gu(?!gi)/p; $match = $ret eq 1 ? 'found' : 'nomatch'; print "<$_> $match "; dd ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}; }
and now i'm about to transfer this problem into Test::More.

Replies are listed 'Best First'.
Re^3: regex negativ lookahead
by haukex (Archbishop) on Nov 14, 2017 at 15:43 UTC
    negativ lookahead did not work as expected. let's focus on:
    ("gugus" =~ /gu(?!gu)/) ==> <gugus> found ("gu", "gu", "s")
    and
    ("gugis" =~ /gu(?!gi)/) ==> <gugis> nomatch (undef, undef, undef)
    do not solve the same.

    Look at it this way:

    1. "gugus" =~ /gu(?!gu)/

      1. Where does /gu/ match in "gugus"?
        "gugus" "gugus" ^^ here and ^^ here
      2. What is the string following the match?
        "gugus" "gugus" ^^ "gus" and ^^ "s"
      3. Does that string match /^gu/ (negative lookahead)?
        "gugus" "gugus" ^^ "gus" yes! ^^ "s" no! => match fails, keep looking => match succeeds
      4. Overall, the match succeeds at "gugus"

        <update2> I realized the above might be a little misleading: The regex engine always works from left to right, so it does not execute the steps in the order I've shown here. It first fully inspects the leftmost match before continuing on to find the second "gu" in the string. </update2>

    2. "gugis" =~ /gu(?!gi)/

      1. Where does /gu/ match in "gugis"?
        "gugis" ^^ here
      2. What is the string following the match?
        "gugis" ^^ "gis"
      3. Does that string match /^gi/ (negative lookahead)?
        "gugis" ^^ "gis" yes! => match fails
      4. Overall, the match fails.
    3. "start" =~ /(?!start)/

      1. Where does // match in "start"? (whitespace added to show the "zero-length strings" around each character, as in ""."s".""."t".""."a".""."r".""."t"."")
        v v v v v v (everywhere) " s t a r t "
      2. What is the string following the match?
        "start" | "tart" | | "art" | | | "rt" | | | | "t" | | | | | "" v v v v v v " s t a r t "
      3. Does that string match /^start/ (negative lookahead)?
        "start" yes => fails | "tart" no => succeeds | | "art" no => succeeds | | | "rt" no => succeeds | | | | "t" no => succeeds | | | | | "" no => succeeds v v v v v v " s t a r t "
      4. Since the regex engine goes from left to right, and stops at the first place where it succeeds, overall, the match succeeds at " s t a r t ".
      5. Side note: If you make the regex engine continue matching where it last left off with /g, you can see the whole thing in action:

        $ perl -wMstrict -MData::Dump while ( "start" =~ /(?!start)/pg ) { dd ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}; } __END__ ("s", "", "tart") ("st", "", "art") ("sta", "", "rt") ("star", "", "t") ("start", "", "")

    If you could give some more complete examples of actual things you're trying to match and the actual regexes, that would probably help.

    Minor edits for clarification.