BigLug has asked for the wisdom of the Perl Monks concerning the following question:

Does anyone know of software or methodology for compredending natural language? I know it's one of those things that computers can't do really well, but if the subject is limited it can't be too hard. In this case the language is English and the subject is time.

It would be easy to search for strings that I know, however I might miss one:

  • "... 10 past ..." always refers to minutes
  • "on 9 11" is a date whereas "at 9 11" is a time

    There'd be hundreds I guess, including the easier and more obvious ones:

  • ".. it was ten past five in the afternoon."

    Does anyone know of anything for doing this?

  • Replies are listed 'Best First'.
    Re: Natural language text processing
    by dws (Chancellor) on Jul 08, 2003 at 22:37 UTC
      Does anyone know of software or methodology for compredending natural language?

      The old (and, alas, defunct) Perl Journal had several articles on natural language processing. Fortunately, these have been reprinted in Computer Science & Perl Programming: Best of TPJ, which contains many other fine articles.

      You might also get some mileage out of Lingua::LinkParser.

    Re: Natural language text processing
    by blokhead (Monsignor) on Jul 09, 2003 at 01:57 UTC
      I'd start by examining how existing CPAN modules parse natural language. The modules in the Lingua::* namespace might provide some useful starting points.

      Also, off the top of my head: Time::Human kinda does what you're talking about, but in the other direction. Date::Manip accepts a wide range of "natural language" input for dates, so that might also be a great start for you.

      Update: From the Date::Manip POD:

      $date = ParseDate("today"); $date = ParseDate("1st thursday in June 1992"); $date = ParseDate("05/10/93"); $date = ParseDate("12:30 Dec 12th 1880"); $date = ParseDate("8:00pm december tenth");
      The range of input accepted by this module might eliminate a lot of work for you!

      blokhead

    Re: Natural language text processing
    by TomDLux (Vicar) on Jul 09, 2003 at 02:35 UTC

      This is somewhat difficult because people in Britain use different phrases than people in the US, who differ from Canadians, who disagree with Australians, etc.

      More exasperatingly, Bostonians use different expressions that Denverites, and Tennessee rural folk don't use Valley Speak---it gags them with a spoon.

      Worst of all, Every two to five years a new micro-generation comes along with a desperate craving to use expressions their parents won't understand.

      In other words, you're doomed.

      --
      TTTATCGGTCGTTATATAGATGTTTGCA

    Re: Natural language text processing
    by Corion (Patriarch) on Jul 09, 2003 at 07:17 UTC

      Natural language processing can be manageable if you confine yourself to a particular problem domain. For example, one of my pet projects that eventually will see light is a module to parse "natural" descriptions for dates for a cron-style scheduler. The idea is that most of these descriptions are phrases that my module will recognize will either have been copied from the synopsis or they will match some simple patterns :

      • The third wednesday of the month
      • The first monday after the first tuesday
      • Every friday

      For my case, there is no deep understanding necessary, as the only ordinal words that can occurr are first,second,third,fourth,fifth, the atomic phrases are simple, and the only composite phrases are atomic phrases concatenated by before and after. There are some implicit assumptions like that of the month is implicitly added if no month is given, and that the next date in the future is selected (that is, a date lies either in the current month or in the month after that if no absolute date has been specified).

      This is by no means a module that "understands" the text given, but with my external knowledge about the supposed content, it can extract and convert the data given.

      perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
          Does it do Southern? "A week come Sunday"? "Sunday week", "Thursday last", etc. How about random stuff "Three moons ago" "Six fortnights" This is hopeless.
    Re: Natural language text processing
    by artist (Parson) on Jul 09, 2003 at 03:34 UTC
      Along with the listed modules, try using Parse::RecDescent for various date-time text format that you may come across. You can keep adding more formats as you see newer variations.

      artist

    Re: Natural language text processing
    by allolex (Curate) on Jul 12, 2003 at 05:32 UTC
      I know it's one of those things that computers can't do really well, but if the subject is limited it can't be too hard.

      Very funny :) For a start, have a look at the Natural Language Processing FAQ. If you'd like an introduction, have a look at James Allen's book.

      --
      Allolex

    A reply falls below the community's threshold of quality. You may see it by logging in.