Computer-readable thesaurus

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Computer-readable thesaurus by clemburg (Curate) on Oct 12, 2001 at 22:20 UTC
For English, Wordnet is something of a standard. There even was an article about playing with it in The Perl Journal. Christian Lemburg Brainbench MVP for Perl http://www.brainbench.com	[reply]
Re: Re: Computer-readable thesaurus by Hanamaki (Chaplain) on Oct 12, 2001 at 23:06 UTC
Additional Information: Dan Brian the author of the above mentioned TPJ article has also founded the Linguana Project to produce an open source natural language processing system based on Wordnet,LinkParser, and other still not developed applications. A paper about Linguana you will find at Dan Brians Project Page. Hanamaki	[reply]
Re: Computer-readable thesaurus by MZSanford (Curate) on Oct 12, 2001 at 17:49 UTC
Rather than a dictionary/thesaurus, might i suggest Lingua::EN::Infinitive ... or just the Lingua namespace in general. The requirements change because they don't know what they want, or how much they own you.	[reply]
Re: Re: Computer-readable thesaurus by Hanamaki (Chaplain) on Oct 12, 2001 at 23:19 UTC
Probably Lingua::Stem may be the module you want to try. Hanamaki	[reply]
Re: Computer-readable thesaurus by perrin (Chancellor) on Oct 12, 2001 at 18:11 UTC
Jon Bjornstad's program to help his disabled friend read includes an interactive dictionary, which I think he got from Project Gutenberg. Take a look.	[reply]
Re: Computer-readable thesaurus by cheshirecat (Sexton) on Oct 13, 2001 at 01:16 UTC
Hi, First post on perl monks, just signed up (great site) I think what you might be looking for is this, It's in the public domain http://www.dcs.shef.ac.uk/research/ilash/Moby/ word lists, thesaurus etc Moby Hyphenator 185,000 entries fully hyphenated mhyph.tar.Z [980kB] Moby Language Word lists in five of the world's great languages mlang.tar.Z [2.3MB] Moby Part-of-Speech 230,000 entries fully described by part(s) of speech, listed in priori +ty order mpos.tar.Z [1.2MB] Moby Pronunciator 175,000 entries fully International Phonetic Alphabet coded mpron.tar.Z [3.1MB] Moby Shakespeare The complete unabridged works of Shakespeare mshak.tar.Z [2.3MB] Moby Thesaurus 30,000 root words, 2.5 million synonyms and related words mthes.tar.Z [12MB] Moby Words 610,000+ words and phrases. The largest word list in the world [download] The Cheshire Cat (...is back)	[reply] [d/l]
Re: Computer-readable thesaurus by mischief (Hermit) on Oct 13, 2001 at 19:30 UTC
You might want to take a look at dict.org and the files on their ftp site. They have several databases available along with client and server software you can use for reference.	[reply]
(tye)Re: Computer-readable thesaurus by tye (Sage) on Oct 12, 2001 at 19:03 UTC
Similar previous requests with good answers: unix dict on win32? and Re: Crossword helper. Though I don't think the thesaurus part is covered. - tye (but my friends call me "Tye")	[reply]
Re: Computer-readable thesaurus by pjf (Curate) on Oct 12, 2001 at 18:51 UTC
Most nix systems come with a dictionary of words, commonly in /usr/dict/words or /usr/share/dict/words. The common spelling utility, ispell, also comes with its own dictionaries, although the format isn't quite as simple as that of /usr/dict/words. If you have ispell installed, then you might want to glance at /usr/lib/ispell or /usr/local/lib/ispell to see if you can spot them. (Look for .hash files). These obviously don't contain word definitions or roots, just the words themselves. However often you can infer the root word using English rules. Again, the ispell source code would probably be a useful start here, since it does exactly that. Cheers, Paul*	[reply]
Re: Computer-readable thesaurus by Fletch (Bishop) on Oct 12, 2001 at 19:56 UTC
For some value of `easy' you can always use LWP, HTML::TreeBuilder, and something like Merriam-Webster Online. Not perl but a starting place with the urls, these are some zsh functions I use: `webster () { _gensearch $0 "http://www.m-w.com/cgi-bin/dictionary?va=" "$" } thesaurus () { _gensearch $0 "http://www.m-w.com/cgi-bin/thesaurus?va=" "$" }` [download]	[reply] [d/l]
Re: Re: Computer-readable thesaurus by tstock (Curate) on Oct 12, 2001 at 21:17 UTC
I'm pretty sure Merrian-Webster online would not like people bypassing their ad revenue in this fashion. I know I pay for the bandwidth on my site, and spend some time blocking agents that steal my content like that. contact publishers before 'using' their work is my advice. Tiago Update: I'm sorry if this was read as flame bait, not my intention. I'm not sorry to have brought up copyright and terms of service. This is not a technical issue, but a moral and sometimes legal one, that developers should be aware of when making agents for the web. For example, using most Finance::Quote:: modules are against the terms of service of the sites who provide this data.	[reply]
Re: Re: Re: Computer-readable thesaurus by Fletch (Bishop) on Oct 12, 2001 at 22:04 UTC
I'm sure they'd also not like for people to use lynx, which doesn't display adds. I'm sure they'd like for people not to use junkbuster or other ad-blocking proxies. I'm sure they'd like for people to mail them large envelopes full of cash. But they've put up their content on a publicly accessable web site. They're perfectly welcome (as are you) to implement whatever technological means to restrict access (of course most of those won't stop a truly determined person with the right know-how, but that's another issue :). But I see little reason to ask for permission to provide an URL which any webmonkey worth his bananas could deduce in under a minute with just a browser's `View Source' functionality. That URL does not magically give you any more access to their content than the form on their front page, just more convinient access. But this is getting off topic from the original question at hand.	[reply]
Re: Re: Re: Re: Computer-readable thesaurus by tstock (Curate) on Oct 12, 2001 at 22:43 UTC