comment on

From what I could see, none of the answers above actually address the problem of creating huge temporary lists when de-duping very large arrays.

#! perl -slw
use strict;

sub dedup {
    my $ref = shift;
    my ($i, %h) = 0;
    while( $i < @$ref ) {
        my $v = $ref->[$i];
        unless( exists $h{$v} ) {
            $h{$v}=undef;
            $i++;
            next;
        }
        splice @{$ref}, $i, 1;
    }
}
my @a = (7, 20..30, 1..1000, 200..300, 7);

print scalar @a;
dedup \@a;
print scalar @a;
#print "@a";
[download]

This avoids the temporary lists by scanning the array, noting what it has seen and spliceing out any duplicates as it goes.

If your data tends to have long runs of duplicates as shown above, you can add logic to save the splice until a run of dups stops and then splice out the run in a single hit which is somewhat more efficient at the cost of complexity.

#! perl -slw
use strict;

sub dedup {
    my $ref = shift;
    my ($i, $count, %h) = (0, 0);
    while( $i < @$ref ) {
        my $v = $ref->[$i];
        unless( exists $h{$v} ) {
            $h{$v}=undef;
            $i++;
            next unless $count;
            $i -= $count;
            splice @{$ref}, $i-1, $count;
            $count = 0;
            next;
        }
        $count++;
        $i++;
    }
    $i -= $count;
    splice @{$ref}, $i, $count;
}
my @a = (5, 1..2, 1..10, 2..5, 7);

print scalar @a, ":@a";
dedup \@a;
print scalar @a, ":@a";
[download]

I can't help but think that some of the extra complexity and duplication could be factored out, but I haven't worked out how. (Yet:).

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

In reply to Re: Removing Duplicates from Array Passed by Reference by BrowserUk
in thread Removing Duplicates from Array Passed by Reference by arunhorne

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.