Here is a measurement script for the two approaches for fairly large hashes (credits to the Anonymous Monk who submitted it in Re: Determining the Memory Usage of a Perl program from within Perl):
use strict; use vars qw(%h1 %h2); sub get_statm_info { my $error = ''; my $ref = {}; if( ! open(_INFO,"</proc/$_[0]/statm") ){ $error = "Couldn't open /proc/$_[0]/statm [$!]"; return $ref; } my @info = split(/\s+/,<_INFO>); close(_INFO); ### these are all the props (skip some) # size resident shared trs lrs drs dt ### ### get the important ones $ref = {size => $info[0] * 4, resident => $info[1] * 4, shared => $info[2] * 4}; return $ref; } # Double hash case. %h1 = map { $_, 16 } 1..10000; %h2 = map { $_, 20 } 1..10000; # Single hash of arrays. #%h1 = map { $_, [16,20] } 1..10000; my $ref = get_statm_info($$); print $ref->{size},"\n"; print $ref->{resident},"\n"; print $ref->{shared},"\n";

I measured 4352k for the two-hash case and 4944k for the single hash of arrays. It makes sense to me that the hash of arrays costs more because of all those extra references.

What data structure makes sense for a search engine? I'm not sure either of these do, if you are scoring hundreds of thousands of pages. You don't want to put all those scores and page references in memory and then sort them.

What might make sense is a heap, in which you can keep the top N best scores partially sorted with a logarithmic insertion time. There's a Heap module for a starting point.


In reply to Re: Hash of Arrays versus Two Hashes by tall_man
in thread Hash of Arrays versus Two Hashes by Cody Pendant

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.