in reply to Removing Duplicates from Array Passed by Reference
From what I could see, none of the answers above actually address the problem of creating huge temporary lists when de-duping very large arrays.
#! perl -slw use strict; sub dedup { my $ref = shift; my ($i, %h) = 0; while( $i < @$ref ) { my $v = $ref->[$i]; unless( exists $h{$v} ) { $h{$v}=undef; $i++; next; } splice @{$ref}, $i, 1; } } my @a = (7, 20..30, 1..1000, 200..300, 7); print scalar @a; dedup \@a; print scalar @a; #print "@a";
This avoids the temporary lists by scanning the array, noting what it has seen and spliceing out any duplicates as it goes.
If your data tends to have long runs of duplicates as shown above, you can add logic to save the splice until a run of dups stops and then splice out the run in a single hit which is somewhat more efficient at the cost of complexity.
#! perl -slw use strict; sub dedup { my $ref = shift; my ($i, $count, %h) = (0, 0); while( $i < @$ref ) { my $v = $ref->[$i]; unless( exists $h{$v} ) { $h{$v}=undef; $i++; next unless $count; $i -= $count; splice @{$ref}, $i-1, $count; $count = 0; next; } $count++; $i++; } $i -= $count; splice @{$ref}, $i, $count; } my @a = (5, 1..2, 1..10, 2..5, 7); print scalar @a, ":@a"; dedup \@a; print scalar @a, ":@a";
I can't help but think that some of the extra complexity and duplication could be factored out, but I haven't worked out how. (Yet:).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Removing Duplicates from Array Passed by Reference
by Util (Priest) on May 17, 2003 at 03:24 UTC | |
by BrowserUk (Patriarch) on May 17, 2003 at 04:48 UTC | |
|
Re: Re: Removing Duplicates from Array Passed by Reference
by Skeeve (Parson) on May 19, 2003 at 08:32 UTC | |
by BrowserUk (Patriarch) on May 19, 2003 at 10:06 UTC | |
by Skeeve (Parson) on May 19, 2003 at 11:40 UTC |