Natural language processing can be manageable if you confine yourself to a particular problem domain. For example, one of my pet projects that eventually will see light is a module to parse "natural" descriptions for dates for a cron-style scheduler. The idea is that most of these descriptions are phrases that my module will recognize will either have been copied from the synopsis or they will match some simple patterns :

For my case, there is no deep understanding necessary, as the only ordinal words that can occurr are first,second,third,fourth,fifth, the atomic phrases are simple, and the only composite phrases are atomic phrases concatenated by before and after. There are some implicit assumptions like that of the month is implicitly added if no month is given, and that the next date in the future is selected (that is, a date lies either in the current month or in the month after that if no absolute date has been specified).

This is by no means a module that "understands" the text given, but with my external knowledge about the supposed content, it can extract and convert the data given.

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

In reply to Re: Natural language text processing by Corion
in thread Natural language text processing by BigLug

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.