in reply to Recognizing parts of speech

How do you score sentences like "He and I are hungry", "I'm not hungry, but she is", "No one is as tired as I am", etc? (Or do you have some way of making sure that all your sentences follow some limited set of simple syntactic frames?)

I'm not asking for the sake of figuring out what sort of algorithm will address the problem. My point is simply to demonstrate why the skepticism cited by an Anonymous Monk elsewhere in this thread is well-deserved. Even if your plans for scoring have principled answers for things like conjoined head nouns, negation, empty trace slots, noun phrases referring to non-entities, etc, building a parser that can associate adjectives with noun phrases the same way people do is a science that is still in its infancy.

(A handful of NLP researchers have been moving it into "adolescence" -- you can check some papers by Eugene Charniak about automatic parsers, but I don't know about availability of source code. You can also check the CORPORA listserv archives for information on open-source or otherwise free parsers.)

I have not tried Lingua::LinkParser, so I don't know what it would do on my examples, or whether its output would meet your needs on such examples. If you have the time, it's worth a try, I'm sure. But if it's important to get the scoring done reasonably well in accordance with your designs, have a fall-back plan that optimizes the use of human scorers.

Sentences that contain none of your listed adjectives can be scored automatically; those that contain one or more adjectives and only one pronoun (and not much else) should also be easy to automate. Those that have one or more adjectives and two or more pronouns or other noun phrases need to be reviewed manually, whether or not you choose to hypothesize a score with a perl script.

Replies are listed 'Best First'.
Re: Re: Recognizing parts of speech
by aquarium (Curate) on May 31, 2003 at 12:34 UTC
    It realy depends on what you want to do with the scores, ie how accurate is good enough. for a rough, but still fairly usable scoring system you could just use averages instead, ie how many times "I" appears in text vs how many times "other" pronouns appear and multiply each part of this ratio by the average counted scores for the adjectives. well, actually, just the ratio figure will suffice for some kind of result.