I can see how to present the number of things in common, but this "number of things that are different" is causing me to stumble just a bit. Comparing a,b,c,d vs x,y,z: there is nothing in common (that number is zero) - what is the difference number? 4 or 3 or what?
Update:
To find the common things, one way is to set up 2 translation hash tables like below - these can be used in combination to achieve that goal, but I am still unsure about "what difference means".
#!/usr/bin/perl -w
use strict;
use Data::Dump qw(pp);
my %country_2_name;
my %name_2_country;
while (<DATA>)
{
s/\s*$//; # trim trailing spaces (also chomps)
my ($name, $countries) = split(' ',$_,2);
my @these_countries = split(/,/,$countries);
$name_2_country{$name} = [@these_countries];
foreach my $this_country (@these_countries)
{
push @{$country_2_name{$this_country}}, $name;
}
}
pp \%country_2_name;
pp \%name_2_country;
=prints
{
Amsterdam => ["Name4"],
Canada => ["Name1", "Name2", "Name3", "Name5"],
China => ["Name3"],
HongKong => ["Name3"],
India => ["Name2", "Name5"],
Ireland => ["Name4"],
London => ["Name4"],
Portugal => ["Name2"],
USA => ["Name1", "Name4", "Name5"],
Yemen => ["Name1"],
}
{
Name1 => ["USA", "Canada", "Yemen"],
Name2 => ["Canada", "Portugal", "India"],
Name3 => ["China", "HongKong", "Canada"],
Name4 => ["London", "Amsterdam", "Ireland", "USA"],
Name5 => ["India", "USA", "Canada"],
}
=cut
__DATA__
Name1 USA,Canada,Yemen
Name2 Canada,Portugal,India
Name3 China,HongKong,Canada
Name4 London,Amsterdam,Ireland,USA
Name5 India,USA,Canada
| [reply] [d/l] |
Hey,
THANKS SO MUCH for your quick reply and trying. I got a 2nd rpely as well which also helps.
Thanks so much, perl monkers are GREAT!
| [reply] |
This is probably a bit long-winded but does seem to work:
my (@names,%hash,%matrix);
while (my $line = <DATA>) {
chomp($line);
my ($name,$list) = split(m/\s+/,$line);
push(@names,$name);
$hash{$name} = [ split(',',$list) ];
}
for my $name (@names) {
my $countries = $hash{$name};
for my $name2 (@names) {
my $diff = get_diff($countries,$hash{$name2});
push( @{ $matrix{$name} }, $diff );
}
}
sub get_diff {
my ($x,$y) = @_;
my (%union,%isect);
for my $item (@$x,@$y) {
$union{$item}++ && $isect{$item}++;
}
return scalar @$x - scalar keys %isect;
}
print "ID\t" . join("\t",@names) . "\n";
for my $name (@names) {
print "$name\t" . join("\t", @{ $matrix{$name} } ) . "\n";
}
__DATA__
Name1 USA,Canada,Yemen
Name2 Canada,Portugal,India
Name3 China,HongKong,Canada
Name4 London,Amsterdam,Ireland,USA
Name5 India,USA,Canada
Output:
ID Name1 Name2 Name3 Name4 Name5
Name1 0 2 2 2 1
Name2 2 0 2 3 1
Name3 2 2 0 3 2
Name4 3 4 4 0 3
Name5 1 1 2 2 0
| [reply] [d/l] [select] |
THANK YOU SO MUCH!!!
Wow, thanks so much for your quick reply and it works beautifully!~
| [reply] |
T'aint pretty, but I think this is what you asked for:
#! perl -slw
use strict;
use Data::Dump qw[ pp ];
my %names = map{
my( $name, $rest ) = split;
$name, { map{ $_, undef } split ',', $rest };
} <DATA>;
my @sortedKeys = sort keys %names;
print "\t", join "\t", @sortedKeys;
for my $i ( @sortedKeys ) {
printf "%s\t", $i;
for my $j ( @sortedKeys ) {
my $nCitiesI = keys %{ $names{ $i } };
my $nMatchingCitiesJ = grep{
exists $names{ $j }{ $_ }
} keys %{ $names{ $i } };
printf "%d\t", $nCitiesI - $nMatchingCitiesJ;
}
print '';
}
__DATA__
Name1 USA,Canada,Yemen
Name2 Canada,Portugal,India
Name3 China,HongKong,Canada
Name4 London,Amsterdam,Ireland,USA
Name5 India,USA,Canada
Produces: C:\test>junk.pl
Name1 Name2 Name3 Name4 Name5
Name1 0 2 2 2 1
Name2 2 0 2 3 1
Name3 2 2 0 3 2
Name4 3 4 4 0 3
Name5 1 1 2 2 0
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] [select] |
Hey, WOw, thanks again so much!!
Now I have 3 codes to review.
Angel
| [reply] |
#! perl -slw
use strict;
use Data::Dump qw[ pp ];
my %names = map{
my( $name, $rest ) = split;
$name, { map{ $_, undef } split ',', $rest };
} <DATA>;
my @sortedKeys = sort keys %names;
print "\t", join "\t", @sortedKeys;
for my $i ( @sortedKeys ) {
my @keysI = keys %{ $names{ $i } };
printf "%s\t", $i;
for my $j ( @sortedKeys ) {
my $nMatchingCitiesJ = grep{
exists $names{ $j }{ $_ }
} @keysI;
printf "%d\t", @keysI - $nMatchingCitiesJ;
}
print '';
}
__DATA__
Name1 USA,Canada,Yemen
Name2 Canada,Portugal,India
Name3 China,HongKong,Canada
Name4 London,Amsterdam,Ireland,USA
Name5 India,USA,Canada
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
| [reply] [d/l] |
Using pseudo(hash-based)-bit-vectors, this code is somewhat easier on the eyes (IMHO):
#! perl
use strict;
my (%name);
while (<DATA>){
my( $n, $rest ) = split;
$name{$n}{$_}=1 for ( split ',', $rest );
}
print "\t",map ({"$_\t"} sort keys %name),"\n"; # First line
for my $n1( sort keys %name){
print "$n1:\t";
for my $n2( sort keys %name){
my $count=scalar keys %{$name{$n2}};
$name{$n2}{$_} and $count-- for keys %{$name{$n1}};
print "$count\t";
}
print "\n";
}
__DATA__
Name1 USA,Canada,Yemen
Name2 Canada,Portugal,India
Name3 China,HongKong,Canada
Name4 London,Amsterdam,Ireland,USA
Name5 India,USA,Canada
My results seem transposed because of a different approach to counting.
All great truths begin as blasphemies.
― George Bernard Shaw, writer, Nobel laureate (1856-1950)
| [reply] [d/l] |
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
my (%incl, %excl);
while (<DATA>) {
my ($id, $loc_list) = split;
my @locs = split /,/ => $loc_list;
@{$incl{$id}}{@locs} = (1) x @locs;
}
for my $in (keys %incl) {
for my $ex (keys %incl) {
$excl{$in}{$ex} = [ grep { ! $incl{$ex}{$_} } keys %{$incl{$in
+}} ];
}
}
say join qq{\t} => q{ID}, sort(keys %incl);
for my $id (sort keys %incl) {
say join qq{\t} => $id, map { scalar @{$excl{$id}{$_}} } sort keys
+ %incl;
}
__DATA__
Name1 USA,Canada,Yemen
Name2 Canada,Portugal,India
Name3 China,HongKong,Canada
Name4 London,Amsterdam,Ireland,USA
Name5 India,USA,Canada
Output:
ID Name1 Name2 Name3 Name4 Name5
Name1 0 2 2 2 1
Name2 2 0 2 3 1
Name3 2 2 0 3 2
Name4 3 4 4 0 3
Name5 1 1 2 2 0
| [reply] [d/l] [select] |
Hey everyone,
Srsly, thanks SO MUCH!!
| [reply] |