in reply to Keywords and keyphrases extraction from text

Recently, when I needed to do some keywording/summarizing, I hacked together a wee little script using (among other things):

... and it worked pretty well, actually. :-)

HTH,

planetscape

Replies are listed 'Best First'.
Re^2: Keywords and keyphrases extraction from text
by vit (Friar) on May 13, 2009 at 16:06 UTC
    Is it working for keyphrases too?
    Do you use any dictionary, thesaurus, apriory training, etc.?

      Mine is a pretty simplistic system. Reading the docs for Lingua::EN::Keywords and Lingua::EN::Summarize will give you an idea of some of the limitations. However, for the task at hand at the time, it worked well enough.

      I've put together a small example, using your sample input, by cutting'n'pasting from the program I wrote back in Sept. 2005 to show you. Is it pretty, or would I write things the same way today? No and probably not, but it should suffice to illustrate:

      use strict; use warnings; use Lingua::EN::Keywords; use Lingua::EN::Summarize; use Lingua::StopWords; my $allcontent = 'Sky Travel Executives provide a rapid and reliable A +irport Transfer service which specializes in catering for airport tax +i transfers to and from all major London airports. 24 Hours executive + cars and luxury 6/7 seater mini vans available at Heathrow Airport , + Gatwick Airport , Stansted Airport , Luton Airport and City Airport. + '; my @keywords = keywords($allcontent); print "Keywords:\n"; print "=========\n"; foreach my $keyword (@keywords) { print $keyword, "\n"; } print "\n"; my $summary = summarize( $allcontent ); print "Summary:\n"; print "========\n"; print $summary; print "\n\n";

      It prints no summary for your text (too short?) and the keywords it picks are less than optimal.

      Keywords: ========= airport major london airports. mini cars travel seater Summary: ========

      I'd experiment with longer inputs, myself, and/or some system of weighting certain keywords.

      You may also find helpful some of Ted Pedersen's work, which I've discussed before.

      HTH,

      planetscape
        Thanks a lot,
        Am I right that all I need to activate your code to experiment with is to install these 3 modules (and possibly those which are involved in those 3)?

        I also ran into Ted Pedersen site but did not understand whether it only does statistical analysis or can solve my entire problem.

        Actually what I want is to split text into meaningful parts and then I can filter only those which are real keywords for the subject using Classification which I am good at.

        One last question is: I can't figure out how to activate sending e-mail notifications to my private e-mail address when I got a forum reply.