in reply to missing simple 'is this a name' module on cpan?
Black, White, Green, Brown, May, Will, Mill, Hill, Bill, Hall, Wall, Fields, Woods, Forest, Frank, Earnest, Jewel, Ruby, Gold, Bond, Chuck, Pat, Sales, Miles, Mark, Irons, Steel, Rod, Reed, Robin, Lark, Singer, Ash, Birch, Lily, Iris, Rose, Drew, Dawn, Eve, Spring, Winter, Summer, Autumn, Laurel, Hardy, Abbot, Burns, Grace, Dolly.
Maybe what you really want is to run your file names through a set of procedures that will:
I don't know about that "Tagger" module, but most English POS taggers will provide the label "proper noun" where appropriate -- given that "appropriate" is based on a statistical likelihood. A really good tagger would return an N-best list rather than just a single "most likely" answer. (Do check out available resources beyond CPAN for POS taggers.)
Or maybe you want to try Lingua::EN::NamedEntity?
What you want to do with all that possible/probable information is another question. The point is, you are looking at a very complicated problem that often poses a challenge to native speakers (who are much better at it than perl scripts, but even so cannot be perfect). You probably could have coded something quickly that might have handled some majority percentage of cases correctly (e.g. 65% or so), but getting beyond that range would take considerably longer.
(Disclaimer: I haven't personally used any of the modules cited above. Some or all of them might be totally unsuitable for your task.)
|
|---|