http://qs1969.pair.com?node_id=11134740


in reply to Using 'keys' on a list

Further down in this thread, I got trolled by an Anon Monk.

Instead of complicated syntax, I would recommend something easier.
Performance-wise it is always better to return fewer things from the sub() that more things.
That should be obvious!

The fastest way (performance-wise) and often the easiest way for a sub() to return a hash to the caller is via a reference to the hash.
Don't worry at all about taking a couple of lines of code vs one line. Sometimes 3 lines are faster and easier to understand than a single line.

Here are some benchmarks for your perusal.
Number (3) below is a "go to" winner!

use strict; use warnings; use Benchmark; use Data::Dumper; # Note: You can increment a string in Perl! WOW! # As long as you don't use the string in a numeric context! # $string++ is quite different than $string +=1 # # Below, this is used to make a bunch of unique hash keys # I don't think that the fact that they are sequential in # an alphabetic sense makes much difference in the generated # hash table because of the way that the Perl hash algorithm # works. sub return_hash { # create a large hash and return that entire hash as list my $string = "ABCDEFGHIJ"; my %hash = map{$string++ => 1}(1..100000); return %hash; } sub return_hash_ref { # create a large hash and return a ref to that hash my $string = "ABCDEFGHIJ"; my %hash = map{$string++ => 1}(1..100000); return \%hash; } sub return_just_keys { # create a large hash and return just the keys of that hash my $string = "ABCDEFGHIJ"; my %hash = map{$string++ => 1}(1..100000); return keys %hash; } timethese(1000, { '1)Keys of Hash via list' => 'my @keys = keys %{{return_hash()}}', '2)Keys of Local Hash copy' => 'my %hash2 = return_hash(); my @key +s = keys %hash2;', '3)Keys of local Hash Ref' => 'my $href = return_hash_ref(); my @k +eys = keys %$href;', '4)Just the returned keys' => 'my @keys = return_just_keys()', }); __END__ Benchmark: timing 1000 iterations of 1)Keys of Hash via list, 2)Keys of Local Hash copy, 3)Keys of local Hash Ref, 4)Just the returned keys... 1)Keys of Hash via list: 182 wallclock secs (179.20 usr + 1.97 sys = +181.17 CPU) @ 5.52/s (n=1000) 2)Keys of Local Hash copy: 184 wallclock secs (182.69 usr + 0.42 sys += 183.11 CPU) @ 5.46/s (n=1000) 3)Keys of local Hash Ref: 130 wallclock secs (129.56 usr + 0.72 sys = + 130.28 CPU) @ 7.68/s (n=1000) 4)Just the returned keys: 125 wallclock secs (125.20 usr + 0.52 sys = + 125.72 CPU) @ 7.95/s (n=1000)
Added: When looking at benchmarks, don't worry so much about 182 vs 184. That could be an artifact of a particular run. Most of the CPU MIP's are being consumed by creating and re-creating the hash. Look at ~180 vs ~130. The relative performance difference between the 2 methods of passing the result back is actually much, more than that because generating the hash "takes a lot of effort".

Replies are listed 'Best First'.
Re^2: Using 'keys' on a list
by swl (Parson) on Jul 07, 2021 at 00:04 UTC

    I'm surprised that returning a list of keys is faster than returning a hashref and then getting its keys. A dereference requires more operations, but should it be that much?

    A good chunk of the time is spent in building the hash, so I added a baseline sub to measure it. I also reduced the number of iterations so it would run in reasonable time on my machine. And some minor formatting of results.

    This is perl 5, version 28, subversion 0 (v5.28.0) built for MSWin32-x +64-multi-thread
    # Adapted from https://www.perlmonks.org/?node_id=11134740 use strict; use warnings; use Benchmark; use Data::Dumper; # Note: You can increment a string in Perl! WOW! # As long as you don't use the string in a numeric context! # $string++ is quite different than $string +=1 # # Below, this is used to make a bunch of unique hash keys # I don't think that the fact that they are sequential in # an alphabetic sense makes much difference in the generated # hash table because of the way that the Perl hash algorithm # works. sub baseline { # create a large hash but return nothing my $string = "ABCDEFGHIJ"; my %hash = map{$string++ => 1}(1..100000); return; } sub return_hash { # create a large hash and return that entire hash as list my $string = "ABCDEFGHIJ"; my %hash = map{$string++ => 1}(1..100000); return %hash; } sub return_hash_ref { # create a large hash and return a ref to that hash my $string = "ABCDEFGHIJ"; my %hash = map{$string++ => 1}(1..100000); return \%hash; } sub return_just_keys { # create a large hash and return just the keys of that hash my $string = "ABCDEFGHIJ"; my %hash = map{$string++ => 1}(1..100000); return keys %hash; } timethese(200, { '1)Keys of Hash via list ' => 'my @keys = keys %{{return_hash()}} +', '2)Keys of Local Hash copy' => 'my %hash2 = return_hash(); my @key +s = keys %hash2;', '3)Keys of local Hash Ref ' => 'my $href = return_hash_ref(); my @ +keys = keys %$href;', '4)Just the returned keys ' => 'my @keys = return_just_keys()', '5)Baseline ' => 'my $res = baseline()', }); __END__ Benchmark: timing 200 iterations of 1)Keys of Hash via list , 2)Keys +of Local Hash copy, 3)Keys of local Hash Ref , 4)Just the returned ke +ys , 5)Baseline ... 1)Keys of Hash via list : 50 wallclock secs (49.44 usr + 1.52 sys = +50.95 CPU) @ 3.93/s (n=200) 2)Keys of Local Hash copy: 50 wallclock secs (49.98 usr + 0.38 sys = +50.36 CPU) @ 3.97/s (n=200) 3)Keys of local Hash Ref : 35 wallclock secs (34.75 usr + 0.34 sys = +35.09 CPU) @ 5.70/s (n=200) 4)Just the returned keys : 33 wallclock secs (32.51 usr + 0.16 sys = +32.67 CPU) @ 6.12/s (n=200) 5)Baseline : 25 wallclock secs (24.86 usr + 0.20 sys = +25.06 CPU) @ 7.98/s (n=200)
      I liked your idea of adding a "baseline"!
      I also really don't know either why generating the list of keys in the sub appears to be faster then giving the caller the ref and having him do it? I would have expected that difference to be smaller. Hopefully some other Monk knows?

      However, the main point remains: If the caller needs the whole hash, give him a ref to a hash. This is much faster than passing the entire hash back as a list. Of course there are memory allocation issues with that because Perl will keep the memory for that hash allocated as long as there is reference to it.

      Added: Except as a part of an object method, I don't know why a sub() in general would create a hash, only to just pass back just the keys? Seems a bit weird, but I'd also like to know why this appears to be somewhat faster.

        "I don't know why a sub() in general would create a hash, only to just pass back just the keys?"

        There are likely to be lots of reasons, but a simple one is to generate a list of unique results from some process.

        Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond