On a more serious note, the objective of the exercise seems to be to reduce the manual work involved in deriving questions that can be answered from having read a piece of text? Assuming that is the goal, then I think that this may be quite doable, using perl, but it would require a different tack to that you have outlined.

Rather than trying to break the body of the text up into discrete chunks and then recombine them into possible answers, which a human being can then subset appropriately before deriving a set of questions, turn the process around.

That is to say. Have the human being construct sets of questions from bodies of example text. Then write a program that takes the sets of questions and the bodies of text and attempts to derive patterns which relate the questions to sequences & relationships of words within the bodies of text.

It would require a good number bodies of text and sets of questions to 'train' the program, and some reasonable mechanism to allow a human being to correct and refine the patterns matched over time.

Approaching the problem this way around means that the program does not have to perform any semantic analysis of either the text or the derived questions. It only needs to discover, extract, retain and refine patterns in text. Which, given Perl's backronym, it's powerful regex engine, renowned text handling facilities and good database handling, makes it seem (to me) like a problem that Perl is eminently capable of tackling.

Of course, if you have a Neural Net handy, they are designed for exactly this type of 'train the computer to recognise patterns in human heuristics, and then allow them to do it for you' problem.

I briefly worked with an IBM product called "The Integrated Reasoning System" (TIRS) (about which I could find surprisingly little on-line), that was being used to encapsulate the judgments made by human insurance underwriters in arriving at policy costs for "non-standard" insurance risks. This is an infinitely more complex process than deriving questions from a body of text. Having seen, with my own eyes, just how good it became, very quickly, I wouldn't dismiss the rather academic language that most of the papers and articles to do with Neural Nets is couched in too quickly. It maybe tough going at first, but no tougher than the problem that you are trying to solve.

Oh, and good luck:)


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

In reply to Re: The (futile?) quest for an automatic paraphrase engine by BrowserUk
in thread The (futile?) quest for an automatic paraphrase engine by dimar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.