in reply to How can I make a string unique (quicker than my approach at least)

I saw a very short little code recently that removes repeated items from a list. I believe it was right here:
https://stackoverflow.com/questions/7651/how-do-i-remove-duplicate-items-from-an-array-in-perl

After wrapping it into a sub, it looks like this:

# Usage: LIST = RemoveDuplicates(LIST) sub RemoveDuplicates { my %seen; grep !$seen{$_}++, @_; }

Maybe somebody who is more knowledgeable can explain how this works, because I don't understand it myself. I just know it works. I tested it.

  • Comment on Re: How can I make a string unique (quicker than my approach at least)
  • Download Code

Replies are listed 'Best First'.
Re^2: How can I make a string unique (quicker than my approach at least)
by Athanasius (Archbishop) on Apr 03, 2024 at 04:21 UTC

    Hello harangzsolt33,

    Maybe somebody ... can explain how this works

    Since there is no explicit return, the sub returns the value of its final statement, namely grep !$seen{$_}++, @_;. @_ contains the arguments passed into the sub, and grep filters out those elements that do not make the expression !$seen{$_}++ true. So let’s look at that expression in detail.

    %seen is a hash, initially empty. When reference is made to an element that does not yet exist, that element is autovivified. So if $_ is 'x' and the hash has no 'x' key, a hash element is created with key 'x' and value undef.

    Now the clever part: postfix ++ increments an item’s value, but the increment is delayed until after the current expression has been evaluated. Further, incrementing undef produces the value 1, because undef is taken to be zero. So if the current value of $_ is not already in the hash %seen, the expression !$seen{$_}++ autovivifies a hash value with key $_ and value undef and applies the logical negation operator ! to the value. Since undef is false by definition, its negation is true and the value of $_ passes through the grep filter into the eventual output of the subroutine.

    But the next time $_ has that value, the hash item $seen{$_} exists and has a value of 1 (from the previous application of postfix ++). And since !1 is false, grep filters this item out. In this way, only the first occurrence of any item passes through the filter. So all repeated items are removed from the original list.

    Hope that helps,

    Athanasius <°(((><contra mundum סתם עוד האקר של פרל,

Re^2: How can I make a string unique (quicker than my approach at least)
by Timka (Acolyte) on Apr 03, 2024 at 04:14 UTC

    I use that snippet sometimes since some older version of List::Util do not include a uniq function.

    The code means:

    sub uniq { my %h; # Keep track of things seen. grep { # 4: Return items seen only once. not $h{$_}++ # 2: Item is not (yet) be seen. # 3: ++ would then say item was seen. } @_; # 1: For each input... }

      Input:

      cat in ID1 nick-john-helena ID2 george-andreas-lisa-anna-matthew-andreas-lisa ID3 olivia-niels-peter-lars-niels-lars-olivia-olivia

      Code:

      perl -MList::Util=uniq -ple ' s/ ^ # Beginning of line (BOL). \w+ # Any "words". \s+ # Any whitespace (like tabs). \K # "Keep" whats to the left. (\S+) # Capture and replace next non whitespace (words). / join "-", # 4: n1-n2-n3 uniq # 3: [ "n1", "n2", "n3" ] split "-", # 2: [ "n1", "n2", "n1", "n3" ] $1 # 1: n1-n2-n1-n3 /xe # Freespace regex and eval replacement. ' in

      Output:

      ID1 nick-john-helena ID2 george-andreas-lisa-anna-matthew ID3 olivia-niels-peter-lars
Re^2: How can I make a string unique (quicker than my approach at least)
by hippo (Archbishop) on Apr 03, 2024 at 08:14 UTC
    Maybe somebody who is more knowledgeable can explain how this works

    I think the FAQ explains it quite well. I would always look there first in preference to StackOverflow anyway.


    🦛

      Agreed.

      Curiously, in this case the top answer at StackOverflow points to the identical faq link you pointed to ... while the second top answer cites perldoc -q duplicate (which emits identical content) and then further goes to the bother of embedding verbatim brian_d_foy's excellent FAQ entry in the SO response! ... so you'd think harangzsolt33 must have seen it (or requires a new pair of glasses) ... maybe he can comment further to clear up this mystery. :)

      👁️🍾👍🦟