Pointers to research?

At one level, this is what research in computational linguistics has been working on for years. Before attempting to invent (note that I didn't say reinvent :) this particular wheel by yourself, try googling for "constraint based grammars", "machine translation", "semantic equivalence" or whatever.

Moreover, I think you have a misconception over what constitutes "equivalence". The non-trivial issues of word order and punctuation have been pointed out above. A further problem is that sentences can be "equivalent", that not only have different grammatical structures but also contain very different words. For example:

- In New York, following the latest Fed rate cut, stocks rose across the board.

- The Federal Bank's further lowering of base rates boosted the NYSE and the NASDAQ.

- Wall Street reacted positively after Greenspan reduced interest rates again.

may be considered strictly equivalent, at least for some definition of equivalence, despite containing very few words in common.

Anyway, best of luck in your project. I, for one, would be very interested in seeing your results.

dave

Update: Fixed typos, changed sample sentences slightly to make them more "equivalent".


In reply to Re: Theory time: Sentence equivalence by Not_a_Number
in thread Theory time: Sentence equivalence by BUU

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.