A more complete Unique()

An interesting note was made in the Chatterbox about how the idiomatic way of deleting duplicates won't work if you have both undef and the empty string ("") as values. This is because stringification treats them both the same.

It also won't work for an array of references, for the same reason. (It won't work for an ordered array for a different reason.)

In the spirit of fixing both issues, I offer the following:

sub unique (@)
{
    my ($arr) = shift;

    my %x;
    for my $index (0 .. $#$arr)
    {
        my $val = $arr->[$index];
        !defined($val) && do {
            $x{__NOT_DEFINED__} ||= [
                $index,
                undef,
            ];
            next;
        };

        $x{$val} ||= [
            $index,
            $val,
        ];
    }

    map { $_->[1] } sort { $a->[0] <=> $b->[0] } values %x;
}
[download]

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Comment on A more complete Unique() Download Code

Replies are listed 'Best First'.
Re: A more complete Unique() by Aristotle (Chancellor) on Jan 23, 2003 at 22:17 UTC
Very complicated and breaks if a literal `__NOT_DEFINED__` appears in the input data. Contrary to the posts so far, the standard idiom is not just using a hash, but combining it with grep: `my %seen; my @unique = grep !$seen{$_}++, @array;` [download] This retains order and does not break references. It does have the problem with empty strings vs undefs cancelling each other away when they (probably) shouldn't, but that's easily fixed: `my (%seen, $seen_undef); my @unique = grep defined ? !$seen{$_}++ : !$seen_undef++, @array;` [download] Makeshifts last the longest.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: A more complete Unique()
by Aristotle (Chancellor) on Jan 23, 2003 at 22:17 UTC

__NOT_DEFINED__

grep

my %seen;
my @unique = grep !$seen{$_}++, @array;
[download]

my (%seen, $seen_undef);
my @unique = grep defined ? !$seen{$_}++ : !$seen_undef++, @array;
[download]

Makeshifts last the longest.

[reply]
[d/l]
[select]