How do you score sentences like "He and I are hungry", "I'm not hungry, but she is", "No one is as tired as I am", etc? (Or do you have some way of making sure that all your sentences follow some limited set of simple syntactic frames?)

I'm not asking for the sake of figuring out what sort of algorithm will address the problem. My point is simply to demonstrate why the skepticism cited by an Anonymous Monk elsewhere in this thread is well-deserved. Even if your plans for scoring have principled answers for things like conjoined head nouns, negation, empty trace slots, noun phrases referring to non-entities, etc, building a parser that can associate adjectives with noun phrases the same way people do is a science that is still in its infancy.

(A handful of NLP researchers have been moving it into "adolescence" -- you can check some papers by Eugene Charniak about automatic parsers, but I don't know about availability of source code. You can also check the CORPORA listserv archives for information on open-source or otherwise free parsers.)

I have not tried Lingua::LinkParser, so I don't know what it would do on my examples, or whether its output would meet your needs on such examples. If you have the time, it's worth a try, I'm sure. But if it's important to get the scoring done reasonably well in accordance with your designs, have a fall-back plan that optimizes the use of human scorers.

Sentences that contain none of your listed adjectives can be scored automatically; those that contain one or more adjectives and only one pronoun (and not much else) should also be easy to automate. Those that have one or more adjectives and two or more pronouns or other noun phrases need to be reviewed manually, whether or not you choose to hypothesize a score with a perl script.


In reply to Re: Recognizing parts of speech by graff
in thread Recognizing parts of speech by justinNEE

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.