Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am wondering if there is an easier or more elegant way to delete all the keys in a hash except for n of them (say 5 of them)?

Example code would be:

my $n = 5; my %hash; @hash{ 0..20 } = 100..120; delete @hash{ (keys %hash)[$n..(scalar keys %hash)] };
Is it possible to do it without having to determine the number of keys in the hash first? Ordering of results or consistent results is not important, it just has to leave any n (say 5) from what was originally there. Have been thinking of using the array index -1 somehow, but no luck so far.

Thanks for any insights into this!

Replies are listed 'Best First'.
Re: Delete all hash keys except for n of them
by tachyon (Chancellor) on Oct 25, 2004 at 03:02 UTC

    Why on earth would you want to effectively randomly delete the keys of a hash to get to a fixed number? You won't get the memory back if that is the reason. Are you sure you don't want a stack or a queue or a heap? Just an ordinary array? Here is another way to reduce the key numbers to $n.

    %hash = map{ $_, $hash{$_} }(keys %hash)[0..$n-1];

    cheers

    tachyon

Re: Delete all hash keys except for n of them
by pg (Canon) on Oct 25, 2004 at 03:24 UTC

    blokhead's code suffers greatly in terms of performance, when the hash is big.

    On my old PC, tachyon's code only took 2 seconds to run with a hash with 100_001 elements. My following version has the same performance as tachyon's, as you can see the idea is the same, instead of deleting, we simply form a new hash: (To test other solutions, simply plug them in between those two print time)

    use Data::Dumper; my $n = 5; my %hash; @hash{ 0..100_000 } = 0..100_000; print time, "\n"; my %temp; @temp{(keys(%hash))[0..4]} = (values(%hash))[0..4]; %hash = %temp; print time, "\n"; print Dumper(\%hash);

    Update:

    When I was writing thhis post, both davido and TedPride's code were not there... I retested all solutions above on my new PC, all solutions other than blokhead's took 0 second to complete (< 1 second). blokhead's took 30 seconds.

    OP's code took 0 second, but has a warning: "Use of uninitialized value in delete at b.pl line 7."

Re: Delete all hash keys except for n of them
by davido (Cardinal) on Oct 25, 2004 at 03:14 UTC

    Here is one way:

    use strict; use warnings; my %hash; @hash{ 'a' .. 'z' } = ( 1 .. 26 ); print 'Total elements in %hash before deletion: ', scalar keys %hash, "\n"; delete $hash{ +each %hash } for 1 .. keys( %hash ) - 5; print 'Total elements in %hash after deletion: ', scalar keys %hash, "\n";

    The only important line is the one starting with delete. You might wonder about the wisdom in deleting hash elements while using each, but if you check the documentation for each, you'll find it states: "It is always safe to delete the item most recently returned by each(), ..."

    This method works by first calling keys in scalar context to get the number of keys, and then subracting five. That is used as the high end of the range of 1 .. n - 5. for iterates over that range, and on each iteration, each is called in scalar context, where it returns the next key. That expression is evaluated inside the  { ... } brackets of $hash{ ... }, so you're basically specifying a hash element. And that hash element is presented to delete for removal. The process repeats itself until only five elements remain.

    I have to admit, the concept of blindly deleting all but five hash elements seems a little wierd. You're not deleting a truely random sequence, and you're not deleting an ordered sequence. Unless you don't care that what remains is neither random nor predictable, I have no idea how it's useful to do this. ;)


    Dave

Re: Delete all hash keys except for n of them
by blokhead (Monsignor) on Oct 25, 2004 at 02:48 UTC
    What you have isn't that bad.. But if you delete the keys one at a time, you can use keys and each in scalar context to avoid putting all the keys in memory. As for elegant, who knows?
    delete $hash{ scalar each %hash } while $n < keys %hash;
    The call to keys resets the internal hash iterator before each loop, so there shouldn't be any weird surprises with that call to each.

    Seems like a strange problem anyway... Maybe you should also consider using an array instead?

    blokhead

      Since it's been raised a few times, I'll provide a bit more information on the bigger picture.

      We have a script that does a chunk of processing and populates the results in a hash. The hash is then used for more processing in a few places, some of which are expensive.

      I wanted to be able to test the expensive bits of code with a smaller hash (taken from the real data), without having to touch each occurrence where the hash was used, so I figured modifying the hash in place would be acceptable, either by creating a temporary hash or deleting a hash slice.

      Early experiments involved testing slices like (keys %hash)[5..-1], but didn't go very far. Will have to read up on array indexes and slices a bit more.

      Thanks for the suggestions. Good to get a wider perspective on possible variations and solutions.

        Well, presumably, the data has to enter the program from somewhere. That is the place you want to modify what your data is. Instead of modifying the script that does the processing, modify the data that the script will process. This also has the side-benefit of helping you understand your data more. Maybe, it's too complicated or too simple.

        This is, in the general, called "Creating Your Test Environment", and it should be one of the first things a developer does when working on an application. For, if you cannot verify that your change(s) do what they should, how do you know you did the right thing?

        As for testing very specific parts of a program, you may want to pull those parts out and test them independently. You can always use MockObjects to give it the scaffolding it needs. If you cannot do this step in a simple manner, then that is a "Red Flag"™ that should be dealt with through refactoring.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      See my testing below, this solution chokes with large hash.

Re: Delete all hash keys except for n of them
by TedPride (Priest) on Oct 25, 2004 at 03:12 UTC
    I've marked the important part below. Basically, each key => value pair is passed to @arr as two items, then %hash is redefined as the first 10 items (5 key => value pairs) of @arr.
    use strict; my %hash; while (<DATA>) { chomp; split(/ => /); $hash{$_[0]} = $_[1]; } ###### my @arr = (%hash); %hash = splice(@arr,0,10); ###### for (keys %hash) { print "$_ => $hash{$_}\n"; } __DATA__ a => 1 b => 2 c => 3 d => 4 e => 5 f => 6 g => 7
Re: Delete all hash keys except for n of them
by Roy Johnson (Monsignor) on Oct 25, 2004 at 14:12 UTC
    This struck me as quite straightforward.
    %hash = map { each %hash } 0..4;

    Caution: Contents may have been coded under pressure.
      Even better!
      %a = (%a)[0 .. $n-1];
      Gautam
        Not only did I not see the code posted earlier, it was wrong too. Doh!
        %a = (%a)[0 .. $n*2-1];
Re: Delete all hash keys except for n of them
by Molt (Chaplain) on Oct 25, 2004 at 09:18 UTC
    One nice way to do this is to treat the hash as array, then take a slice from the array and reassign it. The program below demos this approach.
    #!/usr/bin/perl use strict; use warnings; # Create a test hash. my %test; $test{$_}="$_-$_" for ('aa' .. 'zz'); # Grab N keys. my $num_keys = 5; %test = (%test)[0..$num_keys*2-1]; # Display the result for (sort keys %test) { print "$_ = $test{$_}\n"; }