Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have two arrays A and B. They have a many-to-many relationship. Each value of A can also appear several places in B, and each value of B can appear several places in A. Neither A nor B have unique values.

How can I get a list of all pairs of elements of A,B which share the same values?

Replies are listed 'Best First'.
Re: Many-To-Many Relationship
by chromatic (Archbishop) on Jun 09, 2000 at 04:36 UTC
RE: Many-To-Many Relationship
by Russ (Deacon) on Jun 10, 2000 at 04:19 UTC
    As I understood the question, you want a list of tuples like <nobr>(IndexInFirstArray, IndexInSecondArray)</nobr> for each pair of equal values. In other words, the Cartesian cross-product of matching values between the arrays.

    perlfaq4 doesn't seem to cover your situation (where the arrays have duplicate values).

    First, the code:

    my @A = (1,2,3,4,5,4,3,2,1,2); my @B = (2,4,6,8,10,8,6,4,2); my (%First, %Second); push @{ $First{ $A[$_] } }, $_ for (0..$#A); push @{ $Second{ $B[$_] } }, $_ for (0..$#B); my %Pairs; for my $Common (grep {exists $Second{$_}} keys %First){ for my $Idx (@{ $First{$Common} }){ push @{ $Pairs{$Common} }, map([$Idx, $_], @{ $Second{$Common} }); } } # Print out what we found for (sort keys %Pairs){ print "$_:\n", map(" $_\n", map(join('-', @$_), @{ $Pairs{$_} })), +"\n"; }
    Now, the explanation:

    After some sample data (@A and @B), we store each array's values in a hash tying those values to their indices in the array.

    push @{ $First{ $A[$_] } }, $_ for (0..$#A); push @{ $Second{ $B[$_] } }, $_ for (0..$#B);
    %First refers to @A and %Second to @B. For example, '5' is located at index 4 in @A, so its hash entry (if done manually) would look like: <nobr>$First{5} = [4]</nobr>
    '8' would look like: <nobr>$Second{8} = [3,5]</nobr> because it appears at indices 3 and 5 in @B.

    Now, we loop through all values common to both @A and @B:

    for my $Common (grep {exists $Second{$_}} keys %First){
    For each index in @A, we pair it up with each index in @B and add the tuple to %Pairs
    for my $Idx (@{ $First{$Common} }){ push @{ $Pairs{$Common} }, map([$Idx, $_], @{ $Second{$Common} }); }

    When finished, %Pairs looks like this (pseudo-code, of course):

    $Pairs{Each Common Value} = [ [A1, B1], [A1, B2], [A2, B1], [A2, B2] + ]
    For example, '4' looks like:
    $Pairs{4} = [[3,1], [3,7], [5,1], [5,7]]

    Since I tried to do everything else in one line, I next print out the contents of %Pairs. The output is:

    2: 1-0 1-8 7-0 7-8 9-0 9-8 4: 3-1 3-7 5-1 5-7
    None of this even resembles readable, easy-to-maintain code, but it sure was fun to write! :-)

    Russ

      Russ, thanks for your incredibly detailed response to my question. I am currently studying it as if it were the Bible. I was intrigued by the syntax: push @{ $First{ $A$_ } }, $_ for (0..$#A); I understand what this statement does, but not why it works. P. 259 of the Camel book says that "The argument to push must be a real array, not just a reference to an array." So far so good. It looks as if the syntax creates an anonymous array whose elements are the values of %First. But I see a problem. It says on page 531 of the Camel book that "Hash values do not spring into existence upon reference." So it seems to me that we have a deadlock here. Push demands a value of %First, but %First can only get its values from Push. It seems to me that at the first iteration the command is equivalent to: push @{$First{1}},0; But this ought to fail because at the time of the first iteration $First{1} does not exist. Can anyone explain?
        That's not exactly what it does. It's functionally the programmatic reverse of the array: push @{ $First{  $A[$_] } }, $_ for (0..$#A); Let's take it from the inside out:
        $A[$_] -- the element of @A at the current index $First{ $A[$_] } -- the previous line is a key in %First @{ } -- and the associated value is an anonymous array
        So what the code does is, loop over the number of elements of @A. (If it has ten, the loop goes from 0 to 9). For each of those indices, the value at that index becomes a key in %First, and the value is the element's index, stored in an anonymous array. We have something like this:
        Elements of @A: 0 zero 1 one 2 two Elements of %First: zero [ 0 ] one [ 1 ] two [ 2 ]
        While we can look something up in @A by its index, we can look up the index in %First if we know the something.

        Oh, and in Perl hashes have something called auto-vivification. If you access a hash key that doesn't exist (and try to put data there), the key will be created. (Useful, once you get used to it.)

        Hi,

        Thank you for your kind words. I'm sure my writing falls far short of canonization potential, but I'm flattered nonetheless. :-)

        chromatic has a great answer to your question. The quality and helpfulness of the people here at Perl Monks is a constant source of amazement for me. As he describes, I am utilizing auto-vivification in that terse burst of code to reverse the array's indices and values.

        To find a value in an array, you must know its index. We build the hash to be able to find the indices when we know the value. All that punctuation lets us define and assign to the hash entry at once.

        Good luck.

        Russ