Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: Matching Exact Word

by Anonymous Monk
on Oct 09, 2014 at 09:20 UTC ( [id://1103277]=note: print w/replies, xml ) Need Help??


in reply to Re: Matching Exact Word
in thread Matching a Word Exactly

sleepy i guess :) not anchors, but match an optional word before Guinea, then check
my( $word, $guinea ) = /(\w+)?\s*\b(Guinea)\b/; if( $word eq ucfirst $word ){ warn "Not the Guinea I want ($word $guinea)"; }

Replies are listed 'Best First'.
Re: Matching Exact Word
by jonadab (Parson) on Oct 09, 2014 at 11:01 UTC

    This will fail in some cases.

    Geographically, Guinea is thousands of miles from here. (This fails immediately because of the comma; if the comma were removed, it would still fail.)

    If what you want is to match Guinea but not New Guinea or Equatorial Guinea, then what you probably really want is a negative lookbehind assertion that specifically rules out being preceded by "New " or "Equatorial ". Similarly, a negative lookahead assertion at the end can preclude Guinea Pig and Guinnea-Bisseau.

      If what you want is to match Guinea but not New Guinea or Equatorial Guinea, then what you probably really want is a negative lookbehind assertion that specifically rules out being preceded by "New " or "Equatorial "

      One caveat:  You can't use alternation in the look-behind assertion because variable-length negative look-behind assertion isn't supported. Instead, you must list the alternatives separately. You can, of course, use alternation in the look-ahead assertion.

      use strict; use warnings; my $pattern = qr{ (?<!New\s) (?<!Equatorial\s) Guinea (?![\s-](?:Bissau|pig)) }ix; while (my $text = <DATA>) { my $match = $text =~ m/$pattern/ ? 1 : 0; print "$match $text"; # This prints... # 0 Papua New Guinea # 1 I live in Guinea. # 1 i live in guinea, but i don't have a shift key. # 0 Guinea-Bissau # 0 Guinea Bissau # 0 Equatorial Guinea # 0 I love guinea pigs! } __DATA__ Papua New Guinea I live in Guinea. i live in guinea, but i don't have a shift key. Guinea-Bissau Guinea Bissau Equatorial Guinea I love guinea pigs!

      You are right. However, even this code may fail, if somebody misspells the country names.

      It is more a linguistic problem than a pattern recognition one, and, as such, seems extraordinary difficult to tackle in a failproof way (which would require an AI, a syntaxic and contextual analysis, etc.)

      However, as you mentioned, using negative look-ahead and negative look-behind assertions should allow him to avoid the most common other words.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1103277]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-03-28 21:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found