I was reading this article: http://www.sysarch.com/Perl/sort_paper.html (actually, mentioned here: Re: Sorting geometric coordinates based on priority), and found this:

If the approximate size of the data set is known, preallocating the hash improves performance.

keys my %cache = @in; $cache{$_} = KEY($_) for @in;

The following sets up the cache more efficiently, using a hash slice:

keys my %cache = @in; @cache{@in} = map KEY($_) => @in;

I liked the idiom

keys %h = @a; @h{ @a } = ... ; # do something useful

and thought it would be nice to remember and use it sometimes. While, of course, I was using hash slices before, but rather because they look so concise and, somehow because of this, I felt that code, as a result, must, indeed, be more efficient. And now additional optimization through "magical" use of keys as lvalue, forcing scalar context on array. Actually, keys mentions this optimization, but I missed it before:

Used as an lvalue, keys allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big.

Then I thought it strange that assigning to large hash slice still requires this "preallocation". Then I ran this test:

use strict; use warnings; use Benchmark qw/ cmpthese /;; for my $count ( 100, 1_000, 10_000, 100_000 ) { cmpthese( -5, { 1 => sub { my @a = map { log } 2 .. $count; my %h; keys %h = @a; $h{ $_ } = log for @a; return \%h }, 2 => sub { my @a = map { log } 2 .. $count; my %h; $h{ $_ } = log for @a; return \%h }, 3 => sub { my @a = map { log } 2 .. $count; my %h; keys %h = @a; @h{ @a } = map { log } @a; return \%h }, 4 => sub { my @a = map { log } 2 .. $count; my %h; @h{ @a } = map { log } @a; return \%h }, }) }

log is here to imitate at least some payload (useful work), and to create longer hash keys (if it matters). Returning a reference so that Perl doesn't sniff we don't need this hash and won't skip any work. And that's because of results:

Rate 4 3 2 1 4 1507/s -- -3% -4% -5% 3 1549/s 3% -- -2% -3% 2 1576/s 5% 2% -- -1% 1 1593/s 6% 3% 1% -- Rate 1 2 4 3 1 140/s -- -3% -4% -4% 2 145/s 3% -- -0% -1% 4 145/s 4% 0% -- -0% 3 146/s 4% 1% 0% -- Rate 4 2 3 1 4 12.1/s -- -7% -8% -9% 2 12.9/s 7% -- -2% -3% 3 13.1/s 9% 2% -- -1% 1 13.3/s 10% 3% 1% -- s/iter 4 1 2 3 4 1.39 -- -3% -3% -3% 1 1.35 3% -- -0% -0% 2 1.35 3% 0% -- -0% 3 1.35 3% 0% 0% --

No meaningful difference at all. So, are my tests flawed, or claims about efficiency of slices and preallocation don't hold any water?


In reply to Does "preallocating hash improve performance"? Or "using a hash slice"? by vr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.