From what I could see, none of the answers above actually address the problem of creating huge temporary lists when de-duping very large arrays.
#! perl -slw
use strict;
sub dedup {
my $ref = shift;
my ($i, %h) = 0;
while( $i < @$ref ) {
my $v = $ref->[$i];
unless( exists $h{$v} ) {
$h{$v}=undef;
$i++;
next;
}
splice @{$ref}, $i, 1;
}
}
my @a = (7, 20..30, 1..1000, 200..300, 7);
print scalar @a;
dedup \@a;
print scalar @a;
#print "@a";
This avoids the temporary lists by scanning the array, noting what it has seen and spliceing out any duplicates as it goes.
If your data tends to have long runs of duplicates as shown above, you can add logic to save the splice until a run of dups stops and then splice out the run in a single hit which is somewhat more efficient at the cost of complexity.
#! perl -slw
use strict;
sub dedup {
my $ref = shift;
my ($i, $count, %h) = (0, 0);
while( $i < @$ref ) {
my $v = $ref->[$i];
unless( exists $h{$v} ) {
$h{$v}=undef;
$i++;
next unless $count;
$i -= $count;
splice @{$ref}, $i-1, $count;
$count = 0;
next;
}
$count++;
$i++;
}
$i -= $count;
splice @{$ref}, $i, $count;
}
my @a = (5, 1..2, 1..10, 2..5, 7);
print scalar @a, ":@a";
dedup \@a;
print scalar @a, ":@a";
I can't help but think that some of the extra complexity and duplication could be factored out, but I haven't worked out how. (Yet:).
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
|