I read your output from B::Concise with interest. I also don't know how to interpret the output. Memory allocation effort is a likely suspect in this mystery, but it is beyond me why the sub can do this easier than the caller? Maybe somehow generating a large hash somehow also causes some large block of memory also be allocated to make the keys operation more efficient? I dunno.
As an aside of how I came to deal with a very large hash: I was using the TableMatrix Tk widget. The 2-D table coordinates are hash keys like "$row,$col". It is pretty easy to wind up with 80,000 keys representing a 2-D matrix that way. At the time I did a lot of benchmarking because I wanted the GUI performance to be adequate on my target machine which was a slow Win-XP laptop. I did a lot of benchmarking with pre-sizing the hash (like %hash = 8192 # num of buckets). I found out that although this was faster, it made no significant difference percentage wise in the total application CPU usage because the work done with the hash once it was created just dwarfed the effort to create it in the first place.
To sort this thing by column, I flattened it out to a 2-D array, sorted via ST method, then re-did the hash key representation. This achieved my GUI performance goals at the time and I didn't pursue it further. Getting just the keys of a large hash has never come up in my programming although I can see where that could happen for perhaps an object method. Most times that I've used a large hash, it has always been important to know the values of the keys in addition to the keys (exceptions would be getting unique values or something like that).
I guess unless some more experienced Monk can explain this, it will remain a weird quirky mystery? Clearly some optimization is happening. That optimization may have side effects that conceivably could even be detrimental? There always seems to be a "plus vs minus" for these things.