Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have four arrays (A, B, C, and D). Each array has been compared against the others to find pairs of items between all 6 combinations (e.g. A v B, A v C...). My problem is that I can't find a simple way to find lists of things e.g. which match each other in A, B and C but not in D, for example, where item A1 is paired to B2, and B2 is paired to C3 and C3 is paired to A1, but there is not an equivalent pair in D. This is getting very confusing for me and my code is getting long and complicated, can someone please suggest a nice way to do this?

Let me know if the problem isn't clear!

Here is an example of the data, all comparison files have been split in two, giving 12 arrays. The first element in @A_v_B1 is paired with the first element of @A_v_B2 etc

. This example shows pairwise comparison of A,B and C only, A101 matches B302, which both match C302.

@A_v_B1 @A_v_B2 A101 B302 A103 B405 A104 B406 @A_v_C1 @A_v_C2 A101 C302 A106 C305 A109 C306 @B_v_C1 @B_v_C2 B302 C302 B103 C415 B104 C416

Replies are listed 'Best First'.
Re: confusing array comparison
by BrowserUk (Patriarch) on Feb 07, 2006 at 12:34 UTC

    Would something like this work?

    #! perl -slw use strict; use List::Util qw[ shuffle ]; our $N ||= 10; our $E ||= 5; my @a = (shuffle 1 .. $N)[ 0 .. $E ]; my @b = (shuffle 1 .. $N)[ 0 .. $E ]; my @c = (shuffle 1 .. $N)[ 0 .. $E ]; my @d = (shuffle 1 .. $N)[ 0 .. $E ]; my %comp; $comp{ $_ }{a} = 'a' for @a; $comp{ $_ }{b} = 'b' for @b; $comp{ $_ }{c} = 'c' for @c; $comp{ $_ }{d} = 'd' for @d; print " : a b c d\n--------------"; printf "%4s : %s\n", $_, join ' ', map{ $_||'-' } @{ $comp{ $_ } }{ 'a'..'d' } for sort{$a<=>$b} keys %comp; __END__ c:\test>528473 : a b c d -------------- 1 : - - c - 2 : a - c d 3 : a b - - 4 : a b c - 5 : a b c d 6 : a b - d 7 : - b - d 8 : - - c d 9 : - - c - 10 : a b - d c:\test>528473 -N=10 -E=8 : a b c d -------------- 1 : a - c d 2 : a b c d 3 : a b c d 4 : a b - d 5 : a b c d 6 : a b c d 7 : a b c d 8 : - b c - 9 : a b c d 10 : a b c d

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Hi there,

      thanks for your answer! i dont really understand your code so not sure how to adapt this to work for real data!?

        This first bit is just setting up 4 arrays with random integer test data.

        my @a = (shuffle 1 .. $N)[ 0 .. $E ]; my @b = (shuffle 1 .. $N)[ 0 .. $E ]; my @c = (shuffle 1 .. $N)[ 0 .. $E ]; my @d = (shuffle 1 .. $N)[ 0 .. $E ];

        This bit builds a hash of hashes using the values from the arrays as primary keys and the 'name' of the array that value came from as the secondary key.

        my %comp; $comp{ $_ }{a} = 'a' for @a; $comp{ $_ }{b} = 'b' for @b; $comp{ $_ }{c} = 'c' for @c; $comp{ $_ }{d} = 'd' for @d;

        print a header

        print " : a b c d\n--------------";

        And this bit does the display

        printf "%4s : %s\n", $_, join ' ', map{ $_||'-' } @{ $comp{ $_ } }{ 'a'..'d' } for sort{$a<=>$b} keys %comp;

        In reverse order

        for sort{$a<=>$b} keys %comp;

        For each key (ie. each unique value from the four arrays), in the primary hash, sorted into (in this case ascending numeric order),

        @{ $comp{ $_ } }{ 'a'..'d' }

        Take a slice across the hash for this value,

        map{ $_||'-' }

        Pass the values through a map to replace undef values by a token ('-') to represent that this value was missing in this array.

        join ' ',

        joins the 'found' and 'missing' tokens into a string with some spaces for presentation.

        printf "%4s : %s\n", $_,

        and print out the value, and the string showing which arrays it was found in.

        Each iteration of that loop gives you one line showing 'this value' appeared/was missing in these arrays. So for my test data you read this line:

        6 : a b - d

        As "The value 6 appeared in array @a @b @d but was missing from @c"


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        A reply falls below the community's threshold of quality. You may see it by logging in.
Re: confusing array comparison
by Roy Johnson (Monsignor) on Feb 07, 2006 at 15:34 UTC
    I had to read your description a few times to see that your question wasn't what BrowserUK answered. Given the arrays you list above, try something like this:
    my %A; for (0..$#A_v_B1) { $A{$A_v_B1[$_]}{B} = $A_v_B2[$_]; } for (0..$#A_v_C1) { $A{$A_v_C1[$_]}{C} = $A_v_C2[$_]; }
    That will give you a hash A that looks like this:
    A101 => { B => B302, C => C302 }, A103 => { B => B405 }, A104 => { B => B406 }, A106 => { C => C305 }, A109 => { C => C306 },
    From there, you can see what elements of A have matches in which other sets, and what those matches are. You can build up hashes for B, C, and D similarly, if needed.

    Caution: Contents may have been coded under pressure.
Re: confusing array comparison
by idle (Friar) on Feb 07, 2006 at 12:19 UTC
    You should use hash, as faq says. Or in your case it may be array of hash.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: confusing array comparison
by marto (Cardinal) on Feb 07, 2006 at 12:35 UTC
    Hi Anonymous monk,

    Have you had a look at the functionality provided by the List::Compare module?
    Take a look at the documentation to see if it will be useful for your particular problem.

    Hope this helps.

    Martin
Re: confusing array comparison
by graff (Chancellor) on Feb 08, 2006 at 05:40 UTC
    I'm not sure what sort of source data you're looking at, or what basis there is for saying things like "A101 matches B302" and so on. But if the four arrays actually have matching values, and you want to track the distribution of values across arrays, something like this might help:
    use strict; # pretend these are our four arrays: my @A = qw/we me you them us foo others folks/; my @B = qw/them I we he she it one foo/; my @C = qw/foo bar baz blah me he we she/; my @D = qw/this that another foo bar baz/; my %arefs = ( A => \@A, B => \@B, C => \@C, D => \@D ); my %distro; for my $ary ( sort keys %arefs ) { my $aref = $arefs{$ary}; for ( my $i=0; $i<@$aref; $i++ ) { my $elem = $$aref[$i]; $distro{$elem} .= sprintf( " %s%3.3d ", $ary, $i ); } } for my $elem ( sort keys %distro ) { print "$elem\t$distro{$elem}\n"; } __OUTPUT__ I B001 another D002 bar C001 D004 baz C002 D005 blah C003 folks A007 foo A005 B007 C000 D003 he B003 C005 it B005 me A001 C004 one B006 others A006 she B004 C007 that D001 them A003 B000 this D000 us A004 we A000 B002 C006 you A002
    Now, you can either save the output to a file, and use other tools (or write other perl code) to do interesting things with the list, or just add some more steps in the above script, to grep for values of interest in the %distro hash -- e.g.:
    my @keylist = sort keys %distro; my @hitAll4 = grep { $distro{$_} =~ /A\d+ B\d+ C\d+ D\d+/ } @keylist; my @missingD = grep { $distro{$_} !~ / D\d+/ } @keylist; # and so on.

    (updated the print statement to use tabs for nicer alignment)