in reply to Re: Remove Duplicates from Array
in thread Remove Duplicates from Array

Additionally, List::MoreUtils uniq() will maintain your list order. Using a temporary hash is tempting, but it will almost certainly change the order of elements.

Replies are listed 'Best First'.
Re^3: Remove Duplicates from Array
by linuxer (Curate) on Nov 01, 2008 at 15:17 UTC

    And List::MoreUtils::uniq can be faster as it tries to load a library to implement its functionality via DynaLoader. If that fails it implements a plain perl way.

    In my test (linux, perl 5.8.8 List::MoreUtils 0.21) the original List::MoreUtils::uniq is about 400% faster than my perl implementation.

    If I rename the library, so List::MoreUtils must rely on its perl implementation, my solution is about 20% - 25% faster.

    I don't want to argue against List::MoreUtils; but now I wonder about these two (perl) solutions:

    # presented in perlfaq4 - How can I remove duplicate elements from a l +ist or array? sub my_uniq { my %h; grep { !$h{$_}++ } @_; } # vs. # List::MoreUtils::uniq sub LM_uniq { my %h; map { $h{$_}++ == 0 ? $_ : () } @_; }

    I can't recognize an advantage in the usage of map and the ternary operator.

    edit: text refined

      Well looky there... you know, it never occurred to me to put an empty list into a list with map to skip an entry. I was thinking in terms of Lisp, where adding '() gives you a nil entry. I assumed the result would have been to add 0 or something, even though I knew

      map { ( 0..$_ ) } ( 1..3 );
      yields
      (0, 1, 0, 1, 2, 0, 1, 2, 3)

      Cool.

      # List::MoreUtils::uniq sub LM_uniq { my %h; map { $h{$_}++ == 0 ? $_ : () } @_; }
      This seems like a strange construction (and yet, as you point out, it's what's in the List::MoreUtils source). Surely $h{$_}++ == 0 is the same as ! $h{$_}++ in this setting? I thought the main point of using the List::MoreUtils uniq was to avoid edge cases, but this seems to do nothing but replace a grep with an essentially equivalent map. (In particular, it doesn't do anything to avoid stringification of objects.)

        Well, the Description for the module explicitly tells us two things:

        1. the functions are fairly trivial
        2. their efficiency is due to their implementation in C
        All of the below functions are implementable in only a couple of lines of Perl code. Using the functions from this module however should give slightly better performance as everything is implemented in C. The pure-Perl implementation of these functions only serves as a fallback in case the C portions of this module couldn't be compiled on this machine.

        I'm deducing from the description that the Perl implementations aren't really intended to be that efficient.