Hi, monks,
yeanling & bantling:
After looking over CPAN and doing some WAITing, I'm wondering if
I missed out on the big blinking sign that read "English stuff
is here!", or something. Is there a unified module that offers
a wide variety of English primitives and transforms for Natural
Language Processing? For instance, is there something like
Text::English, but more extensive? If not, I may be
interested in adding my code to somewhere appropriate in
the tree as a starting point. So far no word from the author
of Text::English.
I'm doing a small contract that requires some auto-correlation
and such...
Text::English::stem has been invaluable. Thanks Martin
Porter, implementors, and others! I've also been thinking of taking
advantage of some of the lists at http://wordlist.sourceforge.net/
to hammer out some facilities for future English nightmares.
On an unrelated note, did you know that only a few special
places on the web have the following word sequence according to
Google: "Bring King Ling ring Bing Ding Sing spring swing"
(
Wow, The Phonosemantics of Nasal-Stop Clusters and other music hits.).
Can you think of the longest such a
m/[a-z]+ing/ match which
presumably will trip up Porter's Stemmer (where length
> 5)? The
common thing here is that the ugly duckling word isn't a stemmable
-ing string where that is suppose to cling unlike the word 'spelling'.
Please help me find wordlists that detail English word relationships or other cool language algorithms (I'm no linguist).
Thanks my darlings... (And don't go flinging your dumplings at the poor cageling!
=] )
P.S. See:
Martin's Official PorterStemmer page,
http://snowball.sourceforge.net/ for more info on stemmers.