Mine is a pretty simplistic system. Reading the docs for Lingua::EN::Keywords and Lingua::EN::Summarize will give you an idea of some of the limitations. However, for the task at hand at the time, it worked well enough.
I've put together a small example, using your sample input, by cutting'n'pasting from the program I wrote back in Sept. 2005 to show you. Is it pretty, or would I write things the same way today? No and probably not, but it should suffice to illustrate:
use strict; use warnings; use Lingua::EN::Keywords; use Lingua::EN::Summarize; use Lingua::StopWords; my $allcontent = 'Sky Travel Executives provide a rapid and reliable A +irport Transfer service which specializes in catering for airport tax +i transfers to and from all major London airports. 24 Hours executive + cars and luxury 6/7 seater mini vans available at Heathrow Airport , + Gatwick Airport , Stansted Airport , Luton Airport and City Airport. + '; my @keywords = keywords($allcontent); print "Keywords:\n"; print "=========\n"; foreach my $keyword (@keywords) { print $keyword, "\n"; } print "\n"; my $summary = summarize( $allcontent ); print "Summary:\n"; print "========\n"; print $summary; print "\n\n";
It prints no summary for your text (too short?) and the keywords it picks are less than optimal.
Keywords: ========= airport major london airports. mini cars travel seater Summary: ========
I'd experiment with longer inputs, myself, and/or some system of weighting certain keywords.
You may also find helpful some of Ted Pedersen's work, which I've discussed before.
HTH,
planetscapeIn reply to Re^3: Keywords and keyphrases extraction from text
by planetscape
in thread Keywords and keyphrases extraction from text
by vit
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |