Without an example, I have trouble answering your question. However, if the situation is one where a very detail oriented person who knows minimal English could sit there and look at 1,000 papers and summarize the results by extracting certain key phrases, even without knowing exactly what they mean, then the probability is high that a program can be written to do that.

Programs don't work well with "sort of" or "interpret what you think about this...". "Recommend: Yes/NO" is something that a program can detect. "I'm leaning towards voting Yes, but at this time, I am unsure" is something that a program has close to zero chance of figuring out.

To have a chance at this, you need to identify some key phrases and a syntax that a very, very literal detailed person could use to extract your info. This very, very literal detailed person (the program) will do its job flawlessly, but only within very strict rules. You could wind up in a situation where the program can do 900 of 1,000 files with a clear result, but yet you wind up with 100 to do manually. This has to do with the "rules" and whether the detailed savant (the program) can tell if it got a valid result or not. I've worked with situations where the program can get to 99.5% with certainty, but for the other 0.5%, it knows that it is not certain.

Update: 0.5% may not seem like a lot, but if there are 350,000 records, this is a big deal. Try to find some simple rules where you are absolutely certain that the correct result has been found. Then see what that percentage that is. If that is 90%, then you are probably in pretty good shape as the program did 90% of the work! To get something like this completely automated, the program may need to start applying some ad-hoc rules that involve some uncertainty and that means that the program will guess "wrong" some of the time. You have to decide whether that matters or not?


In reply to Re: Will it work? by Marshall
in thread Perl Possibilities by Gideau

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.