angerusso has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonkers:

I ran into a fairly trivial problem theoretically but I am blanking out how to code it in Perl. I thought I was pretty good with doing simple stuff in Perl and I feel clueless how to start for this problem. What I want is to use the table below to derive a matrix saying for each pair of persons, how many countries/cities are different between them.

Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada
ID Name1 Name2 Name3 Name4 Name5 Name1 0 2 2 2 1 Name2 Name3 Name4 Name5

Any help is highly appreciated!! <\p>

Replies are listed 'Best First'.
Re: Doing pair-exclusivity analysis and building a matrix
by Marshall (Canon) on Mar 28, 2012 at 01:29 UTC
    I can see how to present the number of things in common, but this "number of things that are different" is causing me to stumble just a bit. Comparing a,b,c,d vs x,y,z: there is nothing in common (that number is zero) - what is the difference number? 4 or 3 or what?

    Update:
    To find the common things, one way is to set up 2 translation hash tables like below - these can be used in combination to achieve that goal, but I am still unsure about "what difference means".

    #!/usr/bin/perl -w use strict; use Data::Dump qw(pp); my %country_2_name; my %name_2_country; while (<DATA>) { s/\s*$//; # trim trailing spaces (also chomps) my ($name, $countries) = split(' ',$_,2); my @these_countries = split(/,/,$countries); $name_2_country{$name} = [@these_countries]; foreach my $this_country (@these_countries) { push @{$country_2_name{$this_country}}, $name; } } pp \%country_2_name; pp \%name_2_country; =prints { Amsterdam => ["Name4"], Canada => ["Name1", "Name2", "Name3", "Name5"], China => ["Name3"], HongKong => ["Name3"], India => ["Name2", "Name5"], Ireland => ["Name4"], London => ["Name4"], Portugal => ["Name2"], USA => ["Name1", "Name4", "Name5"], Yemen => ["Name1"], } { Name1 => ["USA", "Canada", "Yemen"], Name2 => ["Canada", "Portugal", "India"], Name3 => ["China", "HongKong", "Canada"], Name4 => ["London", "Amsterdam", "Ireland", "USA"], Name5 => ["India", "USA", "Canada"], } =cut __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada
      Hey, THANKS SO MUCH for your quick reply and trying. I got a 2nd rpely as well which also helps. Thanks so much, perl monkers are GREAT!
Re: Doing pair-exclusivity analysis and building a matrix
by tangent (Parson) on Mar 28, 2012 at 02:25 UTC
    This is probably a bit long-winded but does seem to work:
    my (@names,%hash,%matrix); while (my $line = <DATA>) { chomp($line); my ($name,$list) = split(m/\s+/,$line); push(@names,$name); $hash{$name} = [ split(',',$list) ]; } for my $name (@names) { my $countries = $hash{$name}; for my $name2 (@names) { my $diff = get_diff($countries,$hash{$name2}); push( @{ $matrix{$name} }, $diff ); } } sub get_diff { my ($x,$y) = @_; my (%union,%isect); for my $item (@$x,@$y) { $union{$item}++ && $isect{$item}++; } return scalar @$x - scalar keys %isect; } print "ID\t" . join("\t",@names) . "\n"; for my $name (@names) { print "$name\t" . join("\t", @{ $matrix{$name} } ) . "\n"; } __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada
    Output:
    ID Name1 Name2 Name3 Name4 Name5 Name1 0 2 2 2 1 Name2 2 0 2 3 1 Name3 2 2 0 3 2 Name4 3 4 4 0 3 Name5 1 1 2 2 0
      THANK YOU SO MUCH!!! Wow, thanks so much for your quick reply and it works beautifully!~
Re: Doing pair-exclusivity analysis and building a matrix
by BrowserUk (Patriarch) on Mar 28, 2012 at 02:29 UTC

    T'aint pretty, but I think this is what you asked for:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my %names = map{ my( $name, $rest ) = split; $name, { map{ $_, undef } split ',', $rest }; } <DATA>; my @sortedKeys = sort keys %names; print "\t", join "\t", @sortedKeys; for my $i ( @sortedKeys ) { printf "%s\t", $i; for my $j ( @sortedKeys ) { my $nCitiesI = keys %{ $names{ $i } }; my $nMatchingCitiesJ = grep{ exists $names{ $j }{ $_ } } keys %{ $names{ $i } }; printf "%d\t", $nCitiesI - $nMatchingCitiesJ; } print ''; } __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada

    Produces:

    C:\test>junk.pl Name1 Name2 Name3 Name4 Name5 Name1 0 2 2 2 1 Name2 2 0 2 3 1 Name3 2 2 0 3 2 Name4 3 4 4 0 3 Name5 1 1 2 2 0

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Hey, WOw, thanks again so much!! Now I have 3 codes to review. Angel

        A couple of minor optimisations:

        #! perl -slw use strict; use Data::Dump qw[ pp ]; my %names = map{ my( $name, $rest ) = split; $name, { map{ $_, undef } split ',', $rest }; } <DATA>; my @sortedKeys = sort keys %names; print "\t", join "\t", @sortedKeys; for my $i ( @sortedKeys ) { my @keysI = keys %{ $names{ $i } }; printf "%s\t", $i; for my $j ( @sortedKeys ) { my $nMatchingCitiesJ = grep{ exists $names{ $j }{ $_ } } @keysI; printf "%d\t", @keysI - $nMatchingCitiesJ; } print ''; } __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: Doing pair-exclusivity analysis and building a matrix
by NetWallah (Canon) on Mar 28, 2012 at 05:35 UTC
    Using pseudo(hash-based)-bit-vectors, this code is somewhat easier on the eyes (IMHO):
    #! perl use strict; my (%name); while (<DATA>){ my( $n, $rest ) = split; $name{$n}{$_}=1 for ( split ',', $rest ); } print "\t",map ({"$_\t"} sort keys %name),"\n"; # First line for my $n1( sort keys %name){ print "$n1:\t"; for my $n2( sort keys %name){ my $count=scalar keys %{$name{$n2}}; $name{$n2}{$_} and $count-- for keys %{$name{$n1}}; print "$count\t"; } print "\n"; } __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada
    My results seem transposed because of a different approach to counting.

                 All great truths begin as blasphemies.
                       ― George Bernard Shaw, writer, Nobel laureate (1856-1950)

Re: Doing pair-exclusivity analysis and building a matrix
by kcott (Archbishop) on Mar 28, 2012 at 09:12 UTC

    This seemed like a interesting mental exercise. Here's my take:

    #!/usr/bin/env perl use 5.010; use strict; use warnings; my (%incl, %excl); while (<DATA>) { my ($id, $loc_list) = split; my @locs = split /,/ => $loc_list; @{$incl{$id}}{@locs} = (1) x @locs; } for my $in (keys %incl) { for my $ex (keys %incl) { $excl{$in}{$ex} = [ grep { ! $incl{$ex}{$_} } keys %{$incl{$in +}} ]; } } say join qq{\t} => q{ID}, sort(keys %incl); for my $id (sort keys %incl) { say join qq{\t} => $id, map { scalar @{$excl{$id}{$_}} } sort keys + %incl; } __DATA__ Name1 USA,Canada,Yemen Name2 Canada,Portugal,India Name3 China,HongKong,Canada Name4 London,Amsterdam,Ireland,USA Name5 India,USA,Canada

    Output:

    ID Name1 Name2 Name3 Name4 Name5 Name1 0 2 2 2 1 Name2 2 0 2 3 1 Name3 2 2 0 3 2 Name4 3 4 4 0 3 Name5 1 1 2 2 0

    -- Ken

      Hey everyone, Srsly, thanks SO MUCH!!