Hello,

I want to compare a list of keywords against multiple lists and get output like how many matches for each unique list, for 2 lists, 3 lists, ... keywords are from text file (with \n\t,; separator, but could be also dbi database in future. must scaled as each list can go for thousand to hundred thousand keywords

reading from text file seems easy. not sure about performance:
http://www.perlmonks.org/?node_id=45868
http://stackoverflow.com/questions/761392/easiest-way-to-open-a-text-file-and-read-it-into-an-array-with-perl

While googling list compare, I found this 2 interesting solutions:
http://stackoverflow.com/questions/720482/how-can-i-verify-that-a-value-is-present-in-an-array-list-in-perl
http://search.cpan.org/~jkeenan/List-Compare-0.37/lib/List/Compare.pm#Multiple_Case:_Compare_Three_or_More_Lists

List::Compare seems the most promising, just have to optimised the text file to array part.

use List::Compare; ## Al being the referenced list compare to others @Al = qw(abel abel baker camera delta edward fargo golfer jerky); @Bob = qw(baker camera delta delta edward fargo golfer hilton); @Carmen = qw(fargo golfer hilton icon icon jerky kappa); @Don = qw(fargo icon jerky); @Ed = qw(fargo icon icon jerky); my %list = (0 => 'Al', 1 => 'Bob', 2 => 'Carmen', 3 => 'Don', 4 => 'Ed +'); $lcm = List::Compare->new(\@Al, \@Bob, \@Carmen, \@Don, \@Ed); if (@intersectionAll = $lcm->get_intersection) { $all = (@intersectionAll); } for (my $j = 1; $j < 5; ++$j) { $lcm0 = List::Compare->new(\@{$list{0}}, \@{$list{$j}}); $intername = "intersection-0-$j"; if (@{$intername} = $lcm0->get_intersection) { ${"count-$intername"} = (@{$intername}); } } ## howto get keywords count which are in 2 lists, 3 lists, ... ? my $out = ""; for (my $k = 1; $k < 5; ++$k) { $out .= "count-$list{$k}:".${"count-intersection-0-$k"}." "; } $out .= " all:$all\n"; print $out;
but how to make it for keywords count in multiple list, so output is
count-Bob:6 count-Carmen:3 count-Don:2 count-Ed:2 count2+:0 count3+:2

count3+ representing how many keywords at least in 3 lists.

Thanks a lot. Cheers


In reply to compare a list against multiple lists by raiten

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.