Re: slurping into a substring'ed' array

If you need to sort them anyways, an easy way to do it is first sort the array. With the array sorted, identical items will be right next to each other, so to eliminate duplicates you only have to check if an item is the same as the one immediately preceding it, and if so delete it (or not add it to the results). That's essentially how the Unix utility uniq(1) works.

Here's some code I use to do this. It takes a comparison coderef as the first parameter, much like sort, and the array to be sorted and uniqified as the second.

sub uniq(&@)
{
    my($cmpsub, @list)=@_;
    my $last = shift @list
      or return ();
    my @ret =($last);
    foreach (@list)
    {
      push(@ret,$_)
        unless sortcmp($cmpsub,$_,$last)==0;
      $last = $_;
    }
    @ret;
}
[download]

My suspicion is that doing this will be slightly faster than using a hash for very large arrays, but I haven't measured it.

Comment on Re: slurping into a substring'ed' array Select or Download Code