Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

To get the unique elements in an array, we use the following code.

undef %saw; @saw{@in} = (); @out = sort keys %saw; # remove sort if undesired

In this code, i could not able to understand the meaning of this line, @saw{@in}=();

Could anyone give me the explanation of this line.

Edited by Arunbear: Changed title from 'need explanation', as per Monastery guidelines

Replies are listed 'Best First'.
Re: need explanation of @foo{@bar} = (); (hash slice)
by tilly (Archbishop) on May 08, 2005 at 03:07 UTC
    @saw{@in} is a hash-slice, which allows you to access many spots in the hash at once. Therefore @saw{@in} = (); assigns nothing to a list of spots in the hash. After that operation the hash will have a lot of keys that point to undef.

    Those values aren't very useful, but all that we're interested in is making a list of keys exist in the hash, so it does what we need.

Re: need explanation of @foo{@bar} = (); (hash slice)
by davidrw (Prior) on May 08, 2005 at 03:20 UTC
Re: need explanation of @foo{@bar} = (); (hash slice)
by holli (Abbot) on May 08, 2005 at 06:18 UTC
    ... so @saw{@in}=(); is equivalent to $saw{$_}=undef for @in;


    holli, /regexed monk/
Re: need explanation of @foo{@bar} = (); (hash slice)
by tlm (Prior) on May 08, 2005 at 14:00 UTC

    An alternative usage for what you posted, having the same end-result is

    @saw{@in} = undef;

    Because of autovivification-related bugs, I got into the habit of using defined instead of exists to determine whether a hash contains a certain key. This habit required developing the second habit of using something like

    @saw{@in} = (1) x @in;
    instead of
    @saw{@in} = ();
    As I say, these are just programming habits; with a modicum of vigilance it is perfectly possible to avoid autovivification bugs and still use the @saw{@in} = () idiom. In fact, vigilance would be cheaper than my habit, which costs me a 10% premium in the size of the resulting hashes:
    DB<1> use Devel::Size 'total_size' DB<2> @x{1..1000}=() DB<3> @y{1..1000}=(1) x 1000 DB<4> p total_size(\%x) 41049 DB<5> p total_size(\%y) 45049
    I'm willing to pay this price to avoid heart-stopping encounters with lines like this in my code
    if ( exists $some_hash{ foo } ) { launch_the_missiles(); }
    and the resulting frantic code scan to ensure that it doesn't contain anything like
    $x = $some_hash{ maybe_foo() }{ bar };

    Now I use exists only when dealing with the rare hash for which undef is a legitimate value.

    Update: A related idiom, which, in addition to removing duplicates, also preserves the original order as much as possible, is:

    my @no_repeats = do { my %h; grep !$h{$_}++, @has_repeats };

    the lowliest monk

      Because of autovivification-related bugs, I got into the habit of using defined instead of exists to determine whether a hash contains a certain key.

      Back up one second there. exists exists in Perl's vocabulary for the purpose of determining if a hash element exists. defined doesn't tell you whether or not it exists, it tells you whether or not it has a value set.

      The autovivication problem doesn't occur when you do this:

      print "Exists!\n" if exists $hash{somekey};

      It occurs when you do something like this:

      my %hash = ( Key1 => { john => 1, pete => 2 }, Key2 => { frank => 3, howard => 4 } ); print "ted Exists!\n" if exists $hash{Key7}{ted}; print "Key7 now exists!\n" if exists $hash{Key7};

      The difference is that in order to test for the existance of ted as an element referenced by $hash{Key7} Perl has to autovivify Key7. That is a big difference. The moral of the story is to always check for the existance of the higher level (if you cannot assume it exists) before checking for the existance of the lower level. But that's no reason to resort to using defined to do exists job, and in fact, using defined in this sort of context isn't going to prevent autovivication since testing for ted's value still requires that Key7 spring into existance silently.

      To that point, the "autovivication bugs" issue is not a bug in exists, it's a type of bug that programmers encounter by not understanding how autovivication works.


      Dave

        I did not mean to imply that exists causes autovivification. What I was trying to get at is that autovivification has made me wary of keys that exist spuriously. These can occur in a number of ways, and I'd rather avoid the issue altogether by using defined instead of exists as a matter of policy.

        The point is not whether spurious autovivification can be avoided (it is not hard to do so). The point is that I want to write my code so that, if I find myself debugging it weeks or months later, I don't have to worry over every exists expression that I run across. In other words, the possibility of unintended autovivification renders exists suspect in my eyes, and I'd rather deal with such suspicions as little as possible. A consequence of this is that I don't use undef as a hash value unless it is really necessary, or in very narrowly circumscribed blocks of code, in which one can tell at a glance that unintended autovivification is not a problem. It's all defensive programming, like avoiding globals, for example.

        the lowliest monk