Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I'm getting a bit confused!

I simply have two arrays of numbers which are 'paired' together based on element number, e.g. $array1[0] and $array2[0] are a pair etc.

My problem is that, although each array holds unique numbers, these numbers are sometimes present in both arrays. When this happens, I simply want to remove one copy of the pair and keep the other (remove one element from each array)

Here's an example:

@array1 = qq(13470660 13471850 14028274 14028286); @array2 = qq(14028145 14028286 13476691 13471850); # where the pairs are: 13470660 14028145 13471850 14028286 14028274 13476691 14028286 13471850 # I want the result to be the following because 14028286 13471850 occu +r in each: 13470660 14028145 13471850 14028286 14028274 13476691
Thanks!

Edited by davido: Wrapped $arrayn[0] notation in code tags.

Replies are listed 'Best First'.
Re: comparing arrays
by ikegami (Patriarch) on Dec 16, 2004 at 01:31 UTC

    This will keep the first pair found.

    my %lookup; my @to_keep; foreach (0..$#array1) { my $a1 = $array1[$_]; my $a2 = $array2[$_]; next if $lookup{$a1}; next if $lookup{$a2}; $lookup{$a1} = $lookup{$a2} = 1; push(@to_keep, $_); } @array1 = @array1[@to_keep]; @array2 = @array2[@to_keep];

    The above will yield "interesting" results for

    • @array1 = (1, 1, 4); @array2 = (4, 5, 6); --> @array1 = (1); @array2 = (4);
    • @array1 = (1, 1, 5); @array2 = (4, 5, 6); --> @array1 = (1, 5); @array2 = (4, 6);
Re: comparing arrays
by sauoq (Abbot) on Dec 16, 2004 at 01:09 UTC
    When this happens, I simply want to remove one copy of the pair and keep the other (remove one element from each array)

    You don't explain which array the element should be removed from. In your example, you show one of a set of duplicates being removed from the first array and one from the other set of duplicates being removed from the other array. Could they always be removed from the same array? Do you wish to switch off and remove from first one, then the other, then the first, etc.?

    Once you figure that out, it should be pretty easy to do. Hint: use a hash (or two if necessary.) The keys of a hash are unique.

    -sauoq
    "My two cents aren't worth a dime.";
    
      Hi, Sorry I thought I had explained it. I want to remove just one copy of the duplicate pair e.g. one value from each array - it doesn't matter which array the values are removed from. I dont see how a hash would work - it would help extract the unique values in each array, but how could I use it to keep one copy of the duplicate values? Thanks!
        it doesn't matter which array the values are removed from.

        In that case, it is very simple. You iterate over one array and rebuild it. If a value shows up in the second array, you just ignore it as you are rebuilding. Use a hash to store the values so that lookup is fast...

        my @array1 = (1, 2, 3, 4, 5); my @array2 = (2, 4, 6, 8, 10); my %hash = map {$_=>1} @array2; @array1 = grep { not exists $hash{$_} } @array1; print "@array1\n";

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: comparing arrays
by prasadbabu (Prior) on Dec 16, 2004 at 01:30 UTC
Re: comparing arrays
by ikegami (Patriarch) on Dec 16, 2004 at 01:46 UTC

    Or maybe you want to just skip single elements, without caring if you end up with pairs or not.

    @array1 = qw(1 1 5 3); @array2 = qw(4 5 6 1); my %lookup; sub filter { return 0 if $lookup{$_}; $lookup{$_} = 1; return 1; } @array1 = grep filter, @array1; @array2 = grep filter, @array2; print('@array1 = (', join(', ', @array1), ")\n"); # 1, 5, 3 print('@array2 = (', join(', ', @array2), ")\n"); # 4, 6
Re: comparing arrays
by nedals (Deacon) on Dec 16, 2004 at 07:32 UTC

    I'm reading this differently...

    I simply want to remove one copy of the pair

    This would indicate that pairs somehow got reversed and duplicated.
    So duplicate pairs needs to be removed, resulting in 2 equal length arrays.

    use strict; my @dataA = qw(1 9 3 5 4 2); my @dataB = qw(3 2 1 6 7 9); my $i = 0; foreach my $num (@dataA) { foreach (@dataB) { if ($num == $_) { splice(@dataA,$i,1); splice(@dataB,$i,1); } } $i++; } print "@dataA\n@dataB\n";
Re: comparing arrays
by Anonymous Monk on Dec 17, 2004 at 00:26 UTC
    I got the same set of pairs left in the arrays as TedPride, but I found a problem with his code in that he is splicing from the front of the array and that makes the indexes further along wrong, I believe. The reason it worked OK with this data set is because the only duplicate pair to be spliced from the arrays is the last element. I got warnings when I moved the duplicate from the last pos. in the arrays to the next to the last. But for some reason, the data came out OK (but with the warnings)! I eliminated the warnings with

    for (reverse 0..$#array1)

    change to the for loop.

    Ned's works but I don't know why indexing a higher number

    $i++
    after a splice doesn't cause problems as the array keeps getting resized with splice. Hope someone might know. A link from c.l.p.m.,
    http://groups-beta.google.com/group/comp.lang.perl.misc/msg/49831a95770a2ee5
    dicusses this and there were a few links in my Perl Monks search, which seemed to indicate counting backwards in the for loop.

    My solution walked from the end of the array to the front, splicing when duplicate pairs were found.

    #!/usr/bin/perl use strict; use warnings; my @a1 = qw(13470660 13471850 14028274 14028286); my @a2 = qw(14028145 14028286 13476691 13471850); my %hash; for (reverse 0..$#a1) { # Does 2 checks. To see if a number from the second array was # already seen in the first array. Also, checks to see if its # a 'reversal' or flip flop and thus a duplicate. if (exists $hash{$a2[$_]} && $hash{$a2[$_]} == $a1[$_]) { splice @a1, $_, 1; splice @a2, $_, 1; } else { $hash{$a1[$_]} = $a2[$_]; } } print "@a1\n@a2\n";
    Chris
      Ned's works but I don't know why indexing a higher number, $i++, after a splice doesn't cause problems as the array keeps getting resized with splice. Hope someone might know.

      The answer lies in the result..
      1 9 3 5 4 2
      3 2 1 6 7 9

      At i=0, the inner foreach loop takes out the first match set at index 0; and increments to 1.
      9 3 5 4 2
      2 1 6 7 9

      Then at i=4, instead of taking out the first match set it takes out the second.
      9 3 5 4
      2 1 6 7

Re: comparing arrays
by TedPride (Priest) on Dec 16, 2004 at 19:53 UTC
    How large are the arrays going to be, and how often / how many dupes are there likely to be? My solution below assumes that the arrays are fairly small and won't suffer much from being modified in place. I'm also assuming that you only have pairs (both arrays same length, no missing array cells).

    The interesting thing about my solution is that it returns dupe counts, so you could theoretically even sort dupes by the number of times they appear.

    use strict; use warnings; my @array1 = qw(13470660 13471850 14028274 14028286); my @array2 = qw(14028145 14028286 13476691 13471850); my (%keys, $key, @dupes); for (0..$#array1) { if ($array1[$_] < $array2[$_]) { $key = "$array1[$_] $array2[$_]"; + } else { $key = "$array2[$_] $array1[$_]"; } if ($keys{$key}++ == 1) { splice(@array1, $_, 1); splice(@array2, $_, 1); push(@dupes, $key); } } for (0..$#array1) { print "$array2[$_] $array1[$_]\n"; } print "\n$_ ".($keys{$_}-1) for (@dupes);