Hello,
I want to compare a list of keywords against multiple lists and get output like how many matches for each unique list, for 2 lists, 3 lists, ... keywords are from text file (with \n\t,; separator, but could be also dbi database in future. must scaled as each list can go for thousand to hundred thousand keywords
reading from text file seems easy. not sure about performance:
http://www.perlmonks.org/?node_id=45868
http://stackoverflow.com/questions/761392/easiest-way-to-open-a-text-file-and-read-it-into-an-array-with-perl
While googling list compare, I found this 2 interesting solutions:
http://stackoverflow.com/questions/720482/how-can-i-verify-that-a-value-is-present-in-an-array-list-in-perl
http://search.cpan.org/~jkeenan/List-Compare-0.37/lib/List/Compare.pm#Multiple_Case:_Compare_Three_or_More_Lists
List::Compare seems the most promising, just have to optimised the text file to array part.
but how to make it for keywords count in multiple list, so output isuse List::Compare; ## Al being the referenced list compare to others @Al = qw(abel abel baker camera delta edward fargo golfer jerky); @Bob = qw(baker camera delta delta edward fargo golfer hilton); @Carmen = qw(fargo golfer hilton icon icon jerky kappa); @Don = qw(fargo icon jerky); @Ed = qw(fargo icon icon jerky); my %list = (0 => 'Al', 1 => 'Bob', 2 => 'Carmen', 3 => 'Don', 4 => 'Ed +'); $lcm = List::Compare->new(\@Al, \@Bob, \@Carmen, \@Don, \@Ed); if (@intersectionAll = $lcm->get_intersection) { $all = (@intersectionAll); } for (my $j = 1; $j < 5; ++$j) { $lcm0 = List::Compare->new(\@{$list{0}}, \@{$list{$j}}); $intername = "intersection-0-$j"; if (@{$intername} = $lcm0->get_intersection) { ${"count-$intername"} = (@{$intername}); } } ## howto get keywords count which are in 2 lists, 3 lists, ... ? my $out = ""; for (my $k = 1; $k < 5; ++$k) { $out .= "count-$list{$k}:".${"count-intersection-0-$k"}." "; } $out .= " all:$all\n"; print $out;
count-Bob:6 count-Carmen:3 count-Don:2 count-Ed:2 count2+:0 count3+:2
count3+ representing how many keywords at least in 3 lists.
Thanks a lot. Cheers
In reply to compare a list against multiple lists by raiten
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |