Mine is a pretty simplistic system. Reading the docs for Lingua::EN::Keywords and Lingua::EN::Summarize will give you an idea of some of the limitations. However, for the task at hand at the time, it worked well enough.

I've put together a small example, using your sample input, by cutting'n'pasting from the program I wrote back in Sept. 2005 to show you. Is it pretty, or would I write things the same way today? No and probably not, but it should suffice to illustrate:

use strict; use warnings; use Lingua::EN::Keywords; use Lingua::EN::Summarize; use Lingua::StopWords; my $allcontent = 'Sky Travel Executives provide a rapid and reliable A +irport Transfer service which specializes in catering for airport tax +i transfers to and from all major London airports. 24 Hours executive + cars and luxury 6/7 seater mini vans available at Heathrow Airport , + Gatwick Airport , Stansted Airport , Luton Airport and City Airport. + '; my @keywords = keywords($allcontent); print "Keywords:\n"; print "=========\n"; foreach my $keyword (@keywords) { print $keyword, "\n"; } print "\n"; my $summary = summarize( $allcontent ); print "Summary:\n"; print "========\n"; print $summary; print "\n\n";

It prints no summary for your text (too short?) and the keywords it picks are less than optimal.

Keywords: ========= airport major london airports. mini cars travel seater Summary: ========

I'd experiment with longer inputs, myself, and/or some system of weighting certain keywords.

You may also find helpful some of Ted Pedersen's work, which I've discussed before.

HTH,

planetscape

In reply to Re^3: Keywords and keyphrases extraction from text by planetscape
in thread Keywords and keyphrases extraction from text by vit

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.