in reply to Re: Theory time: Sentence equivalence
in thread Theory time: Sentence equivalence


I don't think you realize what you've gotten into. Research into natural language processing (the official term) is at least 40 years old. Browse on citeulike for NLP, NLG (natural language generation), and NLU (...understanding). Also search Google Scholar.

The topic is extremely theoretical and heavily realizes on Chompsky's theories of grammars, as well as tree theory. There are two main approaches: McKeown's (a name you will see a lot) is usually based on nested templates (IIRC) and requires a massive database of grammatical structures. Other approaches rely on deep semantic representations of the knowledge that is implied by grammatical structures.

A couple of things that you have failed to consider, each of which has massive amounts of discussion in the literature:
- Referring expressions (him, her, that, etc.)
- Focus (what's the subject? Is the sentence really _about_ the subject?)
- homonymns (semantic homonymns, at least)
- fifty billion other things...

The dismal state of machine translation ought to indicate how far yet we have to go. Babelfish is rather state of the art, actually.

Good luck.

~e
  • Comment on Re^2: Theory time: Sentence equivalence

Replies are listed 'Best First'.
Re^3: Theory time: Sentence equivalence
by tphyahoo (Vicar) on Dec 18, 2005 at 11:43 UTC
    On mondays, I think linguists -- Chomsky, Pinker, the lot of them -- are pseudoscientists peddling bumhug. Kind of like certain bad apple social scientists and continental philosophers -- see The Sokal Affair.

    On tuesdays, I think maybe linguists are more like physicists than the wizards sokal pulled back the curtain on.

    Enh, who knows. But a good place to start on the bad news of solving linguistics problems with computing is Pinker's The Language Instinct. Bumhug he may be, on mondays, but I liked the book it a lot anyways :)

Re^3: Theory time: Sentence equivalence
by DrWhy (Chaplain) on Dec 19, 2005 at 17:50 UTC
    Babelfish is rather state of the art, actually
    ...mmmm not really. Babelfish/Systran may be the biggest fish in the pond commerically, but they are hardly state of the art. The big splashes technically are being made by people looking at statistical techniques such as Language Weaver (the company I work for), ISI, and Google.

    As for the original topic. I will just echo some of the other posters and warn you that you are taking some tiny first (mis?)steps on a journey whose destination is a long ways off. This is an intensely complex and interesting topic you could spend your entire life on if you are sufficiently interested/motivated/compensated.

    --DrWhy

    "If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."