Doing pair-exclusivity analysis and building a matrix

angerusso has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Doing pair-exclusivity analysis and building a matrix by Marshall (Canon) on Mar 28, 2012 at 01:29 UTC
I can see how to present the number of things in common, but this "number of things that are different" is causing me to stumble just a bit. Comparing a,b,c,d vs x,y,z: there is nothing in common (that number is zero) - what is the difference number? 4 or 3 or what? Update: To find the common things, one way is to set up 2 translation hash tables like below - these can be used in combination to achieve that goal, but I am still unsure about "what difference means". #!/usr/bin/perl -w use strict; use Data::Dump qw(pp); my %country_2_name; my %name_2_country; while (<DATA>) { s/\s*$//; # trim trailing spaces (also chomps) my ($name, $countries) = split(' ',$_,2); my @these_countries = split(/,/,$countries); $name_2_country{$name} = [@these_countries]; foreach my $this_country (@these_countries) { push @{$country_2_name{$this_country}}, $name; } } pp \%country_2_name; pp \%name_2_country; =prints { Amsterdam => ["Name4"], Canada => ["Name1", "Name2", "Name3", "Name5"], China => ["Name3"], HongKong => ["Name3"], India => ["Name2", "Name5"], Ireland => ["Name4"], London => ["Name4"], Portugal => ["Name2"], USA => ["Name1", "Name4", "Name5"], Yemen => ["Name1"], } { Name1 => ["USA", "Canada", "Yemen"], Name2 => ["Canada", "Portugal", "India"], Name3 => ["China", "HongKong", "Canada"], Name4 => ["London", "Amsterdam", "Ireland", "USA"], Name5 => ["India", "USA", "Canada"], } =cut __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada [download]	[reply] [d/l]
Re^2: Doing pair-exclusivity analysis and building a matrix by angerusso (Novice) on Mar 28, 2012 at 02:47 UTC
Hey, THANKS SO MUCH for your quick reply and trying. I got a 2nd rpely as well which also helps. Thanks so much, perl monkers are GREAT!	[reply]
Re: Doing pair-exclusivity analysis and building a matrix by tangent (Parson) on Mar 28, 2012 at 02:25 UTC
This is probably a bit long-winded but does seem to work: my (@names,%hash,%matrix); while (my $line = <DATA>) { chomp($line); my ($name,$list) = split(m/\s+/,$line); push(@names,$name); $hash{$name} = [ split(',',$list) ]; } for my $name (@names) { my $countries = $hash{$name}; for my $name2 (@names) { my $diff = get_diff($countries,$hash{$name2}); push( @{ $matrix{$name} }, $diff ); } } sub get_diff { my ($x,$y) = @_; my (%union,%isect); for my $item (@$x,@$y) { $union{$item}++ && $isect{$item}++; } return scalar @$x - scalar keys %isect; } print "ID\t" . join("\t",@names) . "\n"; for my $name (@names) { print "$name\t" . join("\t", @{ $matrix{$name} } ) . "\n"; } __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada [download] Output: `ID Name1 Name2 Name3 Name4 Name5 Name1 0 2 2 2 1 Name2 2 0 2 3 1 Name3 2 2 0 3 2 Name4 3 4 4 0 3 Name5 1 1 2 2 0` [download]	[reply] [d/l] [select]
Re^2: Doing pair-exclusivity analysis and building a matrix by angerusso (Novice) on Mar 28, 2012 at 02:46 UTC
THANK YOU SO MUCH!!! Wow, thanks so much for your quick reply and it works beautifully!~	[reply]
Re: Doing pair-exclusivity analysis and building a matrix by BrowserUk (Patriarch) on Mar 28, 2012 at 02:29 UTC
T'aint pretty, but I think this is what you asked for: #! perl -slw use strict; use Data::Dump qw[ pp ]; my %names = map{ my( $name, $rest ) = split; $name, { map{ $_, undef } split ',', $rest }; } <DATA>; my @sortedKeys = sort keys %names; print "\t", join "\t", @sortedKeys; for my $i ( @sortedKeys ) { printf "%s\t", $i; for my $j ( @sortedKeys ) { my $nCitiesI = keys %{ $names{ $i } }; my $nMatchingCitiesJ = grep{ exists $names{ $j }{ $_ } } keys %{ $names{ $i } }; printf "%d\t", $nCitiesI - $nMatchingCitiesJ; } print ''; } __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada [download] Produces: `C:\test>junk.pl Name1 Name2 Name3 Name4 Name5 Name1 0 2 2 2 1 Name2 2 0 2 3 1 Name3 2 2 0 3 2 Name4 3 4 4 0 3 Name5 1 1 2 2 0` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re^2: Doing pair-exclusivity analysis and building a matrix by angerusso (Novice) on Mar 28, 2012 at 02:49 UTC
Hey, WOw, thanks again so much!! Now I have 3 codes to review. Angel	[reply]
Re^3: Doing pair-exclusivity analysis and building a matrix by BrowserUk (Patriarch) on Mar 28, 2012 at 03:51 UTC
A couple of minor optimisations: #! perl -slw use strict; use Data::Dump qw[ pp ]; my %names = map{ my( $name, $rest ) = split; $name, { map{ $_, undef } split ',', $rest }; } <DATA>; my @sortedKeys = sort keys %names; print "\t", join "\t", @sortedKeys; for my $i ( @sortedKeys ) { my @keysI = keys %{ $names{ $i } }; printf "%s\t", $i; for my $j ( @sortedKeys ) { my $nMatchingCitiesJ = grep{ exists $names{ $j }{ $_ } } @keysI; printf "%d\t", @keysI - $nMatchingCitiesJ; } print ''; } __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re: Doing pair-exclusivity analysis and building a matrix by NetWallah (Canon) on Mar 28, 2012 at 05:35 UTC
Using pseudo(hash-based)-bit-vectors, this code is somewhat easier on the eyes (IMHO): #! perl use strict; my (%name); while (<DATA>){ my( $n, $rest ) = split; $name{$n}{$_}=1 for ( split ',', $rest ); } print "\t",map ({"$_\t"} sort keys %name),"\n"; # First line for my $n1( sort keys %name){ print "$n1:\t"; for my $n2( sort keys %name){ my $count=scalar keys %{$name{$n2}}; $name{$n2}{$_} and $count-- for keys %{$name{$n1}}; print "$count\t"; } print "\n"; } __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada [download] My results seem transposed because of a different approach to counting. All great truths begin as blasphemies. ― George Bernard Shaw, writer, Nobel laureate (1856-1950)	[reply] [d/l]
Re: Doing pair-exclusivity analysis and building a matrix by kcott (Archbishop) on Mar 28, 2012 at 09:12 UTC
This seemed like a interesting mental exercise. Here's my take: #!/usr/bin/env perl use 5.010; use strict; use warnings; my (%incl, %excl); while (<DATA>) { my ($id, $loc_list) = split; my @locs = split /,/ => $loc_list; @{$incl{$id}}{@locs} = (1) x @locs; } for my $in (keys %incl) { for my $ex (keys %incl) { $excl{$in}{$ex} = [ grep { ! $incl{$ex}{$_} } keys %{$incl{$in +}} ]; } } say join qq{\t} => q{ID}, sort(keys %incl); for my $id (sort keys %incl) { say join qq{\t} => $id, map { scalar @{$excl{$id}{$_}} } sort keys + %incl; } __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada [download] Output: `ID Name1 Name2 Name3 Name4 Name5 Name1 0 2 2 2 1 Name2 2 0 2 3 1 Name3 2 2 0 3 2 Name4 3 4 4 0 3 Name5 1 1 2 2 0` [download] -- Ken	[reply] [d/l] [select]
Re^2: Doing pair-exclusivity analysis and building a matrix by angerusso (Novice) on Mar 28, 2012 at 17:58 UTC
Hey everyone, Srsly, thanks SO MUCH!!	[reply]