rmoriz has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

maybe some of you know the ebay search functionality e.g. see http://tinyurl.com/p3acd

When you search for a term for example 'The Visual Display of Quantitative Information' you'll get the number of kind of subqueries with parts of the original search phrase.

See below the "Get more results with fewer keywords:" part.

Is there already a CPAN module that's capable of that algorithm or is this just a simple "xor"-like thing?

thank you!
  • Comment on Search: "Get more results with fewer keywords"

Replies are listed 'Best First'.
Re: Search: "Get more results with fewer keywords"
by sgifford (Prior) on Oct 09, 2006 at 04:59 UTC
    A few hints to get you pointed in the right direction:
    • Words like the and of are called stop words. You can eliminate them first, compiling a list of stop words or using something like Lingua::StopWords.
    • A way to choose n items from a set of m is called a combination, and a module like Math::Combinatorics can help you find these combinations.

    Good luck!

Re: Search: "Get more results with fewer keywords"
by Not_a_Number (Prior) on Oct 09, 2006 at 11:50 UTC

    Hadn't got much to do, so I thought I'd give it a stab:

    use strict; use warnings; use Math::Combinatorics; # Probably best to get stopwords from a file, but for testing: my @stopwords = qw/ the of in on with by and /; my %stop_hash = map { $_ => 1 } @stopwords; my $term = 'Programming Perl, by Wall, Christiansen and Orwant'; my @term = grep { not $stop_hash{lc $_} } split ' ', $term; # Default highlighting combination: n-1 from n (3 from 4, 5 from 6..) my $n_from = shift || @term - 1; my @combos = combine( $n_from, @term ); my $result = highlight( \@combos, $term ); print join "\n", @$result; sub highlight { my $to_highlight = shift; my $term = shift; my @highlighted; foreach my $combo ( @$to_highlight ) { my $str = $term; # Highlight with Caps for console output # $str =~ s/$_/<em>$_</em>/ for @$combo; $str =~ s/$_/\U$_/ for @$combo; push @highlighted, $str; } return \@highlighted; }

    Output

    PROGRAMMING PERL, by WALL, CHRISTIANSEN and Orwant PROGRAMMING PERL, by WALL, Christiansen and ORWANT PROGRAMMING PERL, by Wall, CHRISTIANSEN and ORWANT PROGRAMMING Perl, by WALL, CHRISTIANSEN and ORWANT Programming PERL, by WALL, CHRISTIANSEN and ORWANT
Re: Search: "Get more results with fewer keywords"
by planetscape (Chancellor) on Oct 09, 2006 at 19:35 UTC