in reply to Collecting Country in a paragraph
You need a negative look-behind for excluding 'New Mexico' (see perlre), and in general I suggest just having a list of regex's to check against:Mexico is my favorite country. I like Mexico. Welcome to Mexico--a fun place.
Hmm.. /me thinks i should add making a Regexp::Common::Country module to my project listmy @countries = ( # make a list of regex's qr/(?<!New )Mexico/, 'France', 'Germany', qr/(?:(?:U\.S\.(?:A\.)?)|USA?|(?:The )?United States(?: of America)? +)/, ); my $s = do {local $/=undef; <DATA>}; my %found; foreach my $re ( @countries ){ $found{$1}++ while $s =~ s/\b($re)\b/====/s; # NOTE: this is destru +ctive to $s, but does get us the counts. } use Data::Dumper; print Dumper \%found; __DATA__ Mexico is my favorite country. I like Mexico. Welcome to Mexico--a fun place. New Mexico US France Germany U.S.A. USA U.S. US United States United States of America The United States The United States of America
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Collecting Country in a paragraph
by songahji (Friar) on Apr 26, 2006 at 14:41 UTC |