http://qs1969.pair.com?node_id=11134882


in reply to Re^7: Using 'keys' on a list
in thread Using 'keys' on a list

Yes, good point.

For context, the code I'm feeding through B::Concise is a very simplified version of the original benchmark code (see 11134740 and 11134741). That code allocates very large hashes.

The interesting point is that returning a huge list of keys, in list context, is faster than returning a reference to a hash followed by calling keys on a hash dereference. Ordinarily one would expect the latter to be faster than the former.

Replies are listed 'Best First'.
Re^9: Using 'keys' on a list
by Marshall (Canon) on Jul 12, 2021 at 19:45 UTC
    I read your output from B::Concise with interest. I also don't know how to interpret the output. Memory allocation effort is a likely suspect in this mystery, but it is beyond me why the sub can do this easier than the caller? Maybe somehow generating a large hash somehow also causes some large block of memory also be allocated to make the keys operation more efficient? I dunno.

    As an aside of how I came to deal with a very large hash: I was using the TableMatrix Tk widget. The 2-D table coordinates are hash keys like "$row,$col". It is pretty easy to wind up with 80,000 keys representing a 2-D matrix that way. At the time I did a lot of benchmarking because I wanted the GUI performance to be adequate on my target machine which was a slow Win-XP laptop. I did a lot of benchmarking with pre-sizing the hash (like %hash = 8192 # num of buckets). I found out that although this was faster, it made no significant difference percentage wise in the total application CPU usage because the work done with the hash once it was created just dwarfed the effort to create it in the first place.

    To sort this thing by column, I flattened it out to a 2-D array, sorted via ST method, then re-did the hash key representation. This achieved my GUI performance goals at the time and I didn't pursue it further. Getting just the keys of a large hash has never come up in my programming although I can see where that could happen for perhaps an object method. Most times that I've used a large hash, it has always been important to know the values of the keys in addition to the keys (exceptions would be getting unique values or something like that).

    I guess unless some more experienced Monk can explain this, it will remain a weird quirky mystery? Clearly some optimization is happening. That optimization may have side effects that conceivably could even be detrimental? There always seems to be a "plus vs minus" for these things.