comparing arrays

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: comparing arrays by ikegami (Patriarch) on Dec 16, 2004 at 01:31 UTC
This will keep the first pair found. `my %lookup; my @to_keep; foreach (0..$#array1) { my $a1 = $array1[$_]; my $a2 = $array2[$_]; next if $lookup{$a1}; next if $lookup{$a2}; $lookup{$a1} = $lookup{$a2} = 1; push(@to_keep, $_); } @array1 = @array1[@to_keep]; @array2 = @array2[@to_keep];` [download] The above will yield "interesting" results for @array1 = (1, 1, 4); @array2 = (4, 5, 6); --> @array1 = (1); @array2 = (4); @array1 = (1, 1, 5); @array2 = (4, 5, 6); --> @array1 = (1, 5); @array2 = (4, 6);	[reply] [d/l]
Re: comparing arrays by sauoq (Abbot) on Dec 16, 2004 at 01:09 UTC
When this happens, I simply want to remove one copy of the pair and keep the other (remove one element from each array) You don't explain which array the element should be removed from. In your example, you show one of a set of duplicates being removed from the first array and one from the other set of duplicates being removed from the other array. Could they always be removed from the same array? Do you wish to switch off and remove from first one, then the other, then the first, etc.? Once you figure that out, it should be pretty easy to do. Hint: use a hash (or two if necessary.) The keys of a hash are unique. -sauoq "My two cents aren't worth a dime.";	[reply]
Re^2: comparing arrays by Anonymous Monk on Dec 16, 2004 at 01:17 UTC
Hi, Sorry I thought I had explained it. I want to remove just one copy of the duplicate pair e.g. one value from each array - it doesn't matter which array the values are removed from. I dont see how a hash would work - it would help extract the unique values in each array, but how could I use it to keep one copy of the duplicate values? Thanks!	[reply]
Re^3: comparing arrays by sauoq (Abbot) on Dec 16, 2004 at 01:55 UTC
it doesn't matter which array the values are removed from. In that case, it is very simple. You iterate over one array and rebuild it. If a value shows up in the second array, you just ignore it as you are rebuilding. Use a hash to store the values so that lookup is fast... `my @array1 = (1, 2, 3, 4, 5); my @array2 = (2, 4, 6, 8, 10); my %hash = map {$_=>1} @array2; @array1 = grep { not exists $hash{$_} } @array1; print "@array1\n";` [download] -sauoq "My two cents aren't worth a dime.";	[reply] [d/l]
Re^4: comparing arrays by Animator (Hermit) on Dec 16, 2004 at 15:16 UTC
Re: comparing arrays by prasadbabu (Prior) on Dec 16, 2004 at 01:30 UTC
I think List::Compare may help you to do this. Prasad	[reply]
Re: comparing arrays by ikegami (Patriarch) on Dec 16, 2004 at 01:46 UTC
Or maybe you want to just skip single elements, without caring if you end up with pairs or not. `@array1 = qw(1 1 5 3); @array2 = qw(4 5 6 1); my %lookup; sub filter { return 0 if $lookup{$_}; $lookup{$_} = 1; return 1; } @array1 = grep filter, @array1; @array2 = grep filter, @array2; print('@array1 = (', join(', ', @array1), ")\n"); # 1, 5, 3 print('@array2 = (', join(', ', @array2), ")\n"); # 4, 6` [download]	[reply] [d/l]
Re: comparing arrays by nedals (Deacon) on Dec 16, 2004 at 07:32 UTC
I'm reading this differently... I simply want to remove one copy of the pair This would indicate that pairs somehow got reversed and duplicated. So duplicate pairs needs to be removed, resulting in 2 equal length arrays. `use strict; my @dataA = qw(1 9 3 5 4 2); my @dataB = qw(3 2 1 6 7 9); my $i = 0; foreach my $num (@dataA) { foreach (@dataB) { if ($num == $_) { splice(@dataA,$i,1); splice(@dataB,$i,1); } } $i++; } print "@dataA\n@dataB\n";` [download]	[reply] [d/l]
Re: comparing arrays by Anonymous Monk on Dec 17, 2004 at 00:26 UTC
I got the same set of pairs left in the arrays as TedPride, but I found a problem with his code in that he is splicing from the front of the array and that makes the indexes further along wrong, I believe. The reason it worked OK with this data set is because the only duplicate pair to be spliced from the arrays is the last element. I got warnings when I moved the duplicate from the last pos. in the arrays to the next to the last. But for some reason, the data came out OK (but with the warnings)! I eliminated the warnings with `for (reverse 0..$#array1)` change to the for loop. Ned's works but I don't know why indexing a higher number $i++ after a splice doesn't cause problems as the array keeps getting resized with splice. Hope someone might know. A link from c.l.p.m., http://groups-beta.google.com/group/comp.lang.perl.misc/msg/49831a95770a2ee5 dicusses this and there were a few links in my Perl Monks search, which seemed to indicate counting backwards in the for loop. My solution walked from the end of the array to the front, splicing when duplicate pairs were found. #!/usr/bin/perl use strict; use warnings; my @a1 = qw(13470660 13471850 14028274 14028286); my @a2 = qw(14028145 14028286 13476691 13471850); my %hash; for (reverse 0..$#a1) { # Does 2 checks. To see if a number from the second array was # already seen in the first array. Also, checks to see if its # a 'reversal' or flip flop and thus a duplicate. if (exists $hash{$a2[$_]} && $hash{$a2[$_]} == $a1[$_]) { splice @a1, $_, 1; splice @a2, $_, 1; } else { $hash{$a1[$_]} = $a2[$_]; } } print "@a1\n@a2\n"; [download] Chris	[reply] [d/l] [select]
Re^2: comparing arrays by nedals (Deacon) on Dec 17, 2004 at 06:55 UTC
Ned's works but I don't know why indexing a higher number, $i++, after a splice doesn't cause problems as the array keeps getting resized with splice. Hope someone might know. The answer lies in the result.. 1 9 3 5 4 2 3 2 1 6 7 9 At i=0, the inner foreach loop takes out the first match set at index 0; and increments to 1. 9 3 5 4 2 2 1 6 7 9 Then at i=4, instead of taking out the first match set it takes out the second. 9 3 5 4 2 1 6 7	[reply]
Re: comparing arrays by TedPride (Priest) on Dec 16, 2004 at 19:53 UTC
How large are the arrays going to be, and how often / how many dupes are there likely to be? My solution below assumes that the arrays are fairly small and won't suffer much from being modified in place. I'm also assuming that you only have pairs (both arrays same length, no missing array cells). The interesting thing about my solution is that it returns dupe counts, so you could theoretically even sort dupes by the number of times they appear. `use strict; use warnings; my @array1 = qw(13470660 13471850 14028274 14028286); my @array2 = qw(14028145 14028286 13476691 13471850); my (%keys, $key, @dupes); for (0..$#array1) { if ($array1[$_] < $array2[$_]) { $key = "$array1[$_] $array2[$_]"; + } else { $key = "$array2[$_] $array1[$_]"; } if ($keys{$key}++ == 1) { splice(@array1, $_, 1); splice(@array2, $_, 1); push(@dupes, $key); } } for (0..$#array1) { print "$array2[$_] $array1[$_]\n"; } print "\n$_ ".($keys{$_}-1) for (@dupes);` [download]	[reply] [d/l]