in reply to How do I find if an array has duplicate elements, if so discard it?

I think, as other's have suggested that this is largely dependent on your data structures.
e.g. Are we talking about small amounts of data run not so oftenly.
Or are we talking about massive amounts of data that is rarely updated, or massive data that's updated regularly.
Here's a chart:
-----------

small data, unoften updated, order not important
# want simplicity
sub remove_redundant {
  my %hdata = map { ($_, 1 ) } @_;
  return keys %hdata;
}
Note that you might just want to save things in a hash to begin with (and use "keys %data" whenever you want it as an array).

------------
# small data, often update (more reads than writes)
Keep in a hash and convert to a temp array when needed.

%data{$val} = 1;

for my $el ( keys %data ) { ... }

You can even create an overloaded array object which is really just a hash.

------------
# for large data that's rarely updated
push @data, $val if ! grep { $_ eq $val } @data;

It's a linear search, but if we're talking megs of data here, this is MUCH better than building a
HUGE intermediate hash, then having to garbage collect it afterwards.
-----------
# for large data that's often updated. (more reads than writes)
It's worth while building an overloaded class that stores the data as a hash..
Ideally just use a hash (except that that'll waste a lot of memory).
It's better to waste this memory up front than to fragement your dynamic memory
pool by constantly generating intermediate memory scratch pads.

There is one final solution, and that is to maintain sorted data, then implement a
c-function to efficiently perform an insertion function. You can either use a red-black tree or
an insertion sort. You could get very creative, and it would obviously be of general
utilization (since you're storing scalars). There might be CPAN modules for this already. The red-black tree is a good compromise
between performance and memory use whereas the insertion sort will give you the best
memory utilization.
=-=-=-=-=-=-==-
One possible mechanis for using a hash for the array is the following:

our %hash_cache;
sub hash_array_push {
  my ( $hash_name, @vals ) = @_;
  for my $val ( @vals ) {
    $hash_cache{$hash_name}{$val} = 1;
  }
}

sub hash_array_del {
  my ($hash_name) = @_;
  delete $hash_cache{$hash_name};
}

sub hash_array_getref {
  my ( $hash_name) = @_;
  # note that the user can more efficiently
  # utilize an array-ref than an array.
  return $hash_cache{$hash_name};
}
Yes there's an additional function call overhead for these funcs, but you'd have that with just
about any solution aside from manually maintaining the hash. Alternatively you could use it as oo, but OO in perl adds a slightly greater amount of
function-call overhead (which is probably offset by the lack of need to do the symbolic
hash-table name lookup).
  • Comment on Re: How do I find if an array has duplicate elements, if so discard it?