in reply to Matching a Word Exactly

Well, add stuff to match to the regex besides word boundaries ... see perlintro#More complex regular expressions , perlrequick (hint its anchors)

Replies are listed 'Best First'.
Re^2: Matching Exact Word
by Anonymous Monk on Oct 09, 2014 at 09:20 UTC
    sleepy i guess :) not anchors, but match an optional word before Guinea, then check
    my( $word, $guinea ) = /(\w+)?\s*\b(Guinea)\b/; if( $word eq ucfirst $word ){ warn "Not the Guinea I want ($word $guinea)"; }

      This will fail in some cases.

      Geographically, Guinea is thousands of miles from here. (This fails immediately because of the comma; if the comma were removed, it would still fail.)

      If what you want is to match Guinea but not New Guinea or Equatorial Guinea, then what you probably really want is a negative lookbehind assertion that specifically rules out being preceded by "New " or "Equatorial ". Similarly, a negative lookahead assertion at the end can preclude Guinea Pig and Guinnea-Bisseau.

        If what you want is to match Guinea but not New Guinea or Equatorial Guinea, then what you probably really want is a negative lookbehind assertion that specifically rules out being preceded by "New " or "Equatorial "

        One caveat:  You can't use alternation in the look-behind assertion because variable-length negative look-behind assertion isn't supported. Instead, you must list the alternatives separately. You can, of course, use alternation in the look-ahead assertion.

        use strict; use warnings; my $pattern = qr{ (?<!New\s) (?<!Equatorial\s) Guinea (?![\s-](?:Bissau|pig)) }ix; while (my $text = <DATA>) { my $match = $text =~ m/$pattern/ ? 1 : 0; print "$match $text"; # This prints... # 0 Papua New Guinea # 1 I live in Guinea. # 1 i live in guinea, but i don't have a shift key. # 0 Guinea-Bissau # 0 Guinea Bissau # 0 Equatorial Guinea # 0 I love guinea pigs! } __DATA__ Papua New Guinea I live in Guinea. i live in guinea, but i don't have a shift key. Guinea-Bissau Guinea Bissau Equatorial Guinea I love guinea pigs!

        You are right. However, even this code may fail, if somebody misspells the country names.

        It is more a linguistic problem than a pattern recognition one, and, as such, seems extraordinary difficult to tackle in a failproof way (which would require an AI, a syntaxic and contextual analysis, etc.)

        However, as you mentioned, using negative look-ahead and negative look-behind assertions should allow him to avoid the most common other words.