in reply to Re: Keywords and keyphrases extraction from text
in thread Keywords and keyphrases extraction from text

Is it working for keyphrases too?
Do you use any dictionary, thesaurus, apriory training, etc.?
  • Comment on Re^2: Keywords and keyphrases extraction from text

Replies are listed 'Best First'.
Re^3: Keywords and keyphrases extraction from text
by planetscape (Chancellor) on May 13, 2009 at 18:53 UTC

    Mine is a pretty simplistic system. Reading the docs for Lingua::EN::Keywords and Lingua::EN::Summarize will give you an idea of some of the limitations. However, for the task at hand at the time, it worked well enough.

    I've put together a small example, using your sample input, by cutting'n'pasting from the program I wrote back in Sept. 2005 to show you. Is it pretty, or would I write things the same way today? No and probably not, but it should suffice to illustrate:

    use strict; use warnings; use Lingua::EN::Keywords; use Lingua::EN::Summarize; use Lingua::StopWords; my $allcontent = 'Sky Travel Executives provide a rapid and reliable A +irport Transfer service which specializes in catering for airport tax +i transfers to and from all major London airports. 24 Hours executive + cars and luxury 6/7 seater mini vans available at Heathrow Airport , + Gatwick Airport , Stansted Airport , Luton Airport and City Airport. + '; my @keywords = keywords($allcontent); print "Keywords:\n"; print "=========\n"; foreach my $keyword (@keywords) { print $keyword, "\n"; } print "\n"; my $summary = summarize( $allcontent ); print "Summary:\n"; print "========\n"; print $summary; print "\n\n";

    It prints no summary for your text (too short?) and the keywords it picks are less than optimal.

    Keywords: ========= airport major london airports. mini cars travel seater Summary: ========

    I'd experiment with longer inputs, myself, and/or some system of weighting certain keywords.

    You may also find helpful some of Ted Pedersen's work, which I've discussed before.

    HTH,

    planetscape
      Thanks a lot,
      Am I right that all I need to activate your code to experiment with is to install these 3 modules (and possibly those which are involved in those 3)?

      I also ran into Ted Pedersen site but did not understand whether it only does statistical analysis or can solve my entire problem.

      Actually what I want is to split text into meaningful parts and then I can filter only those which are real keywords for the subject using Classification which I am good at.

      One last question is: I can't figure out how to activate sending e-mail notifications to my private e-mail address when I got a forum reply.