comment on

Good Day, I'm trying to construct a large hash and have run into a surprising discrepancy between testing/benchmarking results and actual runtime results. I'm creating a hash of 40000 elements (I would like to do more (up to 4000000) if feasible) and wanted to do so as quickly as possible, so I ran a test using Benckmark.

use Benchmark 'cmpthese';
use strict;

my @x = map join(',', split(//, rand(10000))), 0..80000;
my %y;

cmpthese(-2,
         {'@y{@x}'   => sub { undef %y; @y{@x} = undef },
          '%y = map' => sub { undef %y; %y = map(($_ => undef), @x) }
         }
        );

print 0+%y,$/;
[download]

which handily told me that @y{@x}=undef; is the way to go! (notice the s/iter scores.) -- the print statement is to verify that there are at least 40000 elements in %y (not guaranteed since we use rand).

         s/iter %y = map   @y{@x}
%y = map   2.45       --     -49%
@y{@x}     1.25      97%       --
[download]

When I try to use this in my real code, it takes much longer. (I'm still using strict and warnings)

# ...snip
  my ($n,$i,@n,@temp,%arrangements);
  # get size and create all permutations (comma separated)
  $n = 8;        @n = (1..$n);
  $i = 0;        $temp[&_fact($n)-1] = 1; # _fact := n!

  $|++;
  permute { $temp[$i++] = join(',',@n) } @n;
  print STDERR scalar(localtime),$/;
  @arrangements{@temp} = undef;
  print STDERR scalar(localtime),$/;
[download]

So, from my tests I expected to see this snippet run rather quickly since the permute function is very fast for a list of 8 elements (< 1 sec). At the very least I expected the output from the two localtimes to be close together, but instead I got:

computing 8-arrangements...
Mon Sep 30 15:30:31 2002
Mon Sep 30 15:33:35 2002
[download]

When I watch the processes in "top", the test code quickly slurps up 70M of memory, but my production code only uses 6M (and does actually work correctly). I'm not sure what the problem is and have run out of ideas for how to isolate the problem. Any suggestions would be greatly appreciated!

Thanks,
Dean

P.S. I am actually fairly certain that Perl is the proper solution fo this problem since the rest of the program runs very quickly (tested using DProf) and would be a nightmare to write in any other language.

If we didn't reinvent the wheel, we wouldn't have rollerblades.

In reply to constructing large hashes by duelafn

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.