SwissGeorge has asked for the wisdom of the Perl Monks concerning the following question:

hi, i'm trying to get all strings not containing <start|sidebar>. the problem now? i can't change logic <=~> to <!=> which would work perfectly. reading perldoc and other sources negativ lookahead should solve this problem as well. this one works perfectly
use strict; use warnings; my @vars = ('quit','quirly cat', 'cat queue','granit'); print ("any string 'qu' followed by 'it'\n"); foreach (@vars) { if ($_ =~ /qu(?=it)/) { print ("found <$_>\n"); } else { print ("no match for <$_>\n"); } } print ("\nany string 'qu' not followed by 'it'\n"); foreach (@vars) { if ($_ =~ /qu(?!it)/) { print ("found <$_>\n"); } else { print ("no match for <$_>\n"); } }

this one alreay show up some problems

use strict; use warnings; my @vars = ('and gugus for','gugus and', 'gurit for','granit'); print ("any string 'gu' followed by 'gu'\n"); foreach (@vars) { if ($_ =~ /gu(?=gu)/) { print ("found <$_>\n"); } else { print ("no match for <$_>\n"); } } print ("\nany string 'gu' not followed by 'gu'\n"); foreach (@vars) { if ($_ =~ /gu(?!gu)/) { print ("found <$_>\n"); } else { print ("no match for <$_>\n"); } }

question: why does <gugus> show found, when <gu> followed by <gu> should be suppressed? what did i not understand correctly?

finally my real problem:

use strict; use warnings; my @test = ("ananas", "de:mindset:rule1", "en:mindset:rule1", "wiki:we +lcome", " de:sidebar", "sidebar", "en:sidebar", "start", "de:start", "en:start") +; my ($cnt, $result); print "start first loop <var !=> \n"; $cnt=0; foreach (@test) { if ($_ !~ /(start|sidebar)/) { print " >$_<\n"; $cnt++; } } $result = ($cnt eq 4? ">> correct result" : ">> wrong result"); print "end first loop <$cnt> is $result\n\nstart second loop <negative + lookahea d>\n"; $cnt=0; foreach (@test) { if ($_ =~ /(?!start|sidebar)/) { print " >$_<\n"; $cnt++; } } $result = ($cnt eq 4? ">> correct result" : ">> wrong result"); print "end second loop <$cnt> is $result\n"; print ">>pgm ended\n";

question: how is the correct notation for "find any string without start|sidebar" using negative lookahead?

thank you and

Replies are listed 'Best First'.
Re: regex negativ lookahead
by tybalt89 (Monsignor) on Nov 14, 2017 at 07:31 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1203332 use strict; use warnings; my @test = ("ananas", "de:mindset:rule1", "en:mindset:rule1", "wiki:we +lcome", " de:sidebar", "sidebar", "en:sidebar", "start", "de:start", "en:start") +; my ($cnt, $result); print "start first loop <var !=> \n"; $cnt=0; foreach (@test) { if ($_ !~ /(start|sidebar)/) { print " >$_<\n"; $cnt++; } } $result = ($cnt eq 4? ">> correct result" : ">> wrong result"); print "end first loop <$cnt> is $result\n\nstart second loop <negative + lookahea d>\n"; $cnt=0; foreach (@test) { if ($_ =~ /^(?!.*start|.*sidebar)/s) { print " >$_<\n"; $cnt++; } } $result = ($cnt eq 4? ">> correct result" : ">> wrong result"); print "end second loop <$cnt> is $result\n"; print ">>pgm ended\n";
Re: regex negativ lookahead
by haukex (Archbishop) on Nov 14, 2017 at 10:57 UTC
    what did i not understand correctly?

    You've already gotten some good answers, let me address this part: Lookaround Assertions are zero-width. So for example, /foo(?!bar)/ matches the same thing as /foo/, except that the next thing after the foo may not be bar, but it can be anything else, including the end of the string. So /(?!start|sidebar)/, because it matches zero characters, will match anywhere in the string where the next thing is not (start|sidebar), which is always true at the end of the string!*

    tybalt89's solution addresses this by anchoring the match at the beginning of the string. You haven't explained the bigger picture of what you're trying to do, so currently I don't understand why you want to use negative lookahead instead of !~ /(start|sidebar)/ - it sounds to me like you want to use this regex as part of a bigger regex, in which case additional considerations might need to be taken if you want to use lookaround assertions. If you could explain the context, perhaps we could suggest additional solutions.

    BTW, I like to test my regexes the way I showed in Re: How to ask better questions using Test::More and sample data.

    * Update: And in fact it's true in most places in the string.

    $ perl -wMstrict -MData::Dump "a"=~/(?!a)/p; dd ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}; "start"=~/(?!start)/p; dd ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}; __END__ ("a", "", "") ("s", "", "tart")
      thank you for helping answer.

      well why not using !~ ? because it's a configuration value, accepting regexp and a do know the pattern for excluding values from a array. my "real problem script" does work with real data.

      by now i first try to understand perl functions .. perlref Lookaround Assertions as you already noted. first of all any tests with positiv lookahaed ($var =~ /a(?=string)/) work as expected.

      negativ lookahead did not work as expected. let's focus on:
      ("gugus" =~ /gu(?!gu)/) ==> <gugus> found ("gu", "gu", "s")
      and
      ("gugis" =~ /gu(?!gi)/) ==> <gugis> nomatch (undef, undef, undef)
      do not solve the same. was using your proposal to dump match array. great idea, thank you

      use strict; use warnings; use Data::Dump; my @var1 = ('gugu', 'gugus', 'and gugut for','gugas and', 'gurit for', +'granit') ; my @var2 = ('gugi', 'gugis', 'and gugit for','gugas and', 'gurit for', +'granit') ; my ($ret, $match); print ("get any string 'gu' not followed by 'gu'\n"); foreach (@var1) { $ret = $_ =~ /gu(?!gu)/p; $match = $ret eq 1 ? 'found' : 'nomatch'; print "<$_> $match "; dd ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}; } print ("\nget any string 'gu' followed by 'gi'\n"); foreach (@var2) { $ret = $_ =~ /gu(?!gi)/p; $match = $ret eq 1 ? 'found' : 'nomatch'; print "<$_> $match "; dd ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}; }
      and now i'm about to transfer this problem into Test::More.
        negativ lookahead did not work as expected. let's focus on:
        ("gugus" =~ /gu(?!gu)/) ==> <gugus> found ("gu", "gu", "s")
        and
        ("gugis" =~ /gu(?!gi)/) ==> <gugis> nomatch (undef, undef, undef)
        do not solve the same.

        Look at it this way:

        1. "gugus" =~ /gu(?!gu)/

          1. Where does /gu/ match in "gugus"?
            "gugus" "gugus" ^^ here and ^^ here
          2. What is the string following the match?
            "gugus" "gugus" ^^ "gus" and ^^ "s"
          3. Does that string match /^gu/ (negative lookahead)?
            "gugus" "gugus" ^^ "gus" yes! ^^ "s" no! => match fails, keep looking => match succeeds
          4. Overall, the match succeeds at "gugus"

            <update2> I realized the above might be a little misleading: The regex engine always works from left to right, so it does not execute the steps in the order I've shown here. It first fully inspects the leftmost match before continuing on to find the second "gu" in the string. </update2>

        2. "gugis" =~ /gu(?!gi)/

          1. Where does /gu/ match in "gugis"?
            "gugis" ^^ here
          2. What is the string following the match?
            "gugis" ^^ "gis"
          3. Does that string match /^gi/ (negative lookahead)?
            "gugis" ^^ "gis" yes! => match fails
          4. Overall, the match fails.
        3. "start" =~ /(?!start)/

          1. Where does // match in "start"? (whitespace added to show the "zero-length strings" around each character, as in ""."s".""."t".""."a".""."r".""."t"."")
            v v v v v v (everywhere) " s t a r t "
          2. What is the string following the match?
            "start" | "tart" | | "art" | | | "rt" | | | | "t" | | | | | "" v v v v v v " s t a r t "
          3. Does that string match /^start/ (negative lookahead)?
            "start" yes => fails | "tart" no => succeeds | | "art" no => succeeds | | | "rt" no => succeeds | | | | "t" no => succeeds | | | | | "" no => succeeds v v v v v v " s t a r t "
          4. Since the regex engine goes from left to right, and stops at the first place where it succeeds, overall, the match succeeds at " s t a r t ".
          5. Side note: If you make the regex engine continue matching where it last left off with /g, you can see the whole thing in action:

            $ perl -wMstrict -MData::Dump while ( "start" =~ /(?!start)/pg ) { dd ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH}; } __END__ ("s", "", "tart") ("st", "", "art") ("sta", "", "rt") ("star", "", "t") ("start", "", "")

        If you could give some more complete examples of actual things you're trying to match and the actual regexes, that would probably help.

        Minor edits for clarification.

Re: regex negative lookahead
by hippo (Archbishop) on Nov 14, 2017 at 09:30 UTC
    question: why does <gugus> show found, when <gu> followed by <gu> should be suppressed? what did i not understand correctly?

    Your code searches for ("gu" not followed by "gu") and "gugus" does indeed contain that if you look closely enough: it's the 3rd and 4th letters.