raiten has asked for the wisdom of the Perl Monks concerning the following question:
Hello,
I want to compare a list of keywords against multiple lists and get output like how many matches for each unique list, for 2 lists, 3 lists, ... keywords are from text file (with \n\t,; separator, but could be also dbi database in future. must scaled as each list can go for thousand to hundred thousand keywords
reading from text file seems easy. not sure about performance:
http://www.perlmonks.org/?node_id=45868
http://stackoverflow.com/questions/761392/easiest-way-to-open-a-text-file-and-read-it-into-an-array-with-perl
While googling list compare, I found this 2 interesting solutions:
http://stackoverflow.com/questions/720482/how-can-i-verify-that-a-value-is-present-in-an-array-list-in-perl
http://search.cpan.org/~jkeenan/List-Compare-0.37/lib/List/Compare.pm#Multiple_Case:_Compare_Three_or_More_Lists
List::Compare seems the most promising, just have to optimised the text file to array part.
but how to make it for keywords count in multiple list, so output isuse List::Compare; ## Al being the referenced list compare to others @Al = qw(abel abel baker camera delta edward fargo golfer jerky); @Bob = qw(baker camera delta delta edward fargo golfer hilton); @Carmen = qw(fargo golfer hilton icon icon jerky kappa); @Don = qw(fargo icon jerky); @Ed = qw(fargo icon icon jerky); my %list = (0 => 'Al', 1 => 'Bob', 2 => 'Carmen', 3 => 'Don', 4 => 'Ed +'); $lcm = List::Compare->new(\@Al, \@Bob, \@Carmen, \@Don, \@Ed); if (@intersectionAll = $lcm->get_intersection) { $all = (@intersectionAll); } for (my $j = 1; $j < 5; ++$j) { $lcm0 = List::Compare->new(\@{$list{0}}, \@{$list{$j}}); $intername = "intersection-0-$j"; if (@{$intername} = $lcm0->get_intersection) { ${"count-$intername"} = (@{$intername}); } } ## howto get keywords count which are in 2 lists, 3 lists, ... ? my $out = ""; for (my $k = 1; $k < 5; ++$k) { $out .= "count-$list{$k}:".${"count-intersection-0-$k"}." "; } $out .= " all:$all\n"; print $out;
count-Bob:6 count-Carmen:3 count-Don:2 count-Ed:2 count2+:0 count3+:2
count3+ representing how many keywords at least in 3 lists.
Thanks a lot. Cheers
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: compare a list against multiple lists
by rjt (Curate) on Mar 19, 2013 at 23:41 UTC | |
by raiten (Acolyte) on Mar 20, 2013 at 15:40 UTC | |
by rjt (Curate) on Mar 21, 2013 at 23:28 UTC | |
|
Re: compare a list against multiple lists
by LanX (Saint) on Mar 19, 2013 at 23:24 UTC |