Mine is a pretty simplistic system. Reading the docs for Lingua::EN::Keywords and Lingua::EN::Summarize will give you an idea of some of the limitations. However, for the task at hand at the time, it worked well enough.
I've put together a small example, using your sample input, by cutting'n'pasting from the program I wrote back in Sept. 2005 to show you. Is it pretty, or would I write things the same way today? No and probably not, but it should suffice to illustrate:
use strict;
use warnings;
use Lingua::EN::Keywords;
use Lingua::EN::Summarize;
use Lingua::StopWords;
my $allcontent = 'Sky Travel Executives provide a rapid and reliable A
+irport Transfer service which specializes in catering for airport tax
+i transfers to and from all major London airports. 24 Hours executive
+ cars and luxury 6/7 seater mini vans available at Heathrow Airport ,
+ Gatwick Airport , Stansted Airport , Luton Airport and City Airport.
+ ';
my @keywords = keywords($allcontent);
print "Keywords:\n";
print "=========\n";
foreach my $keyword (@keywords) {
print $keyword, "\n";
}
print "\n";
my $summary = summarize( $allcontent );
print "Summary:\n";
print "========\n";
print $summary;
print "\n\n";
It prints no summary for your text (too short?) and the keywords it picks are less than optimal.
Keywords:
=========
airport
major london airports.
mini
cars
travel
seater
Summary:
========
I'd experiment with longer inputs, myself, and/or some system of weighting certain keywords.
You may also find helpful some of Ted Pedersen's work, which I've discussed before.
|