fiddler42 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, All,

Although I use perl fairly regularly, I don't have to create complex hash tables very often. In addition, I can usually heavily leverage hash table examples I have in pre-existing scripts. Every so often, though, I have to create a hash table that doesn't look much like anything I have, and I get stuck pretty easily. Alas, it has happened again. I am creating a table that is formatted like so:-

$NetNameAndFanout = join (" ",$NetName,$NetFanout); $NetStatsHash{$NetNameAndFanout} = join (" ",$NetCap,$NetRes,$NetLengt +h,$FirstDriver);
As you can see, the key is a simple string. But I need to split the key to properly work with its contents. Here is what I am doing:-
for ($i = 1; $i <= $UpperFanoutLimitOfTable; $i++) { $TotalCap[$i] = 0; $TotalRes[$i] = 0; $TotalLength[$i] = 0; $FanoutCount = 0; foreach $MasterKey (keys %NetStatsHash) { @KeyEntries = split (/\s+/,$MasterKey); $KeyNetName = $KeyEntries[0]; $KeyFanout = $KeyEntries[1]; $OkayToAdd = 0; if ($KeyFanout == $i) { @NetDetails = split (/\s+/,$NetStatsHash{$MasterKey}); $NetCap = $NetDetails[0]; $NetRes = $NetDetails[1]; $NetLength = $NetDetails[2]; $FirstDriver = $NetDetails[3]; if ($DebugMode == 1) {print ("$i $NetCap $NetRes $NetLengt +h $FirstDriver\n");} if (($UseDriverCell == 0) && ($UseNetPattern == 0)) {$Okay +ToAdd = 1;} if ($UseDriverCell == 1) { if ($FirstDriver =~ /$DriverPattern/) { $OkayToAdd = 1; } } if ($UseNetPattern == 1) { if ($KeyNetName =~ /$NetPattern/) { $OkayToAdd = 1; } } if ($OkayToAdd == 1) { if ($DebugMode == 1) {print ("Processing net $KeyNetNa +me (driver = $FirstDriver)...\n");} $TotalCap[$i] = $TotalCap[$i] + $NetCap; $TotalRes[$i] = $TotalRes[$i] + $NetRes; $TotalLength[$i] = $TotalLength[$i] + $NetLength; $FanoutCount++; } } } if ($FanoutCount > 0) { $CapAvg[$i] = $TotalCap[$i]/$FanoutCount; $ResAvg[$i] = $TotalRes[$i]/$FanoutCount; $LengthAvg[$i] = $TotalLength[$i]/$FanoutCount; } if ($FanoutCount == 0) { print ("No nets match input criteria when fanout = $i. Please + review command-line settings--they may be too restrictive!\n"); } if ($DebugMode == 1) { printf ("CapAvg[$i] = %0.6f\n",$CapAvg[$i]); printf ("ResAvg[$i] = %0.6f\n",$ResAvg[$i]); printf ("LengthAvg[$i] = %0.6f\n",$LengthAvg[$i]); } }
...but I just don't like it. I really want to be able to sort the table by $NetFanout, an integer, without having to split the master key every time. In addition, when I get to $KeyFanout = $UpperFanoutLimitOfTable + 1, I want to quickly exit out of the master for ($i) loop. (I think that will be easy to manage once the hash is sorted properly.)

All of my attempts to create a multi-key hash have not worked out so well. How would a real expert clean up the above? Note that I can't use $NetFanout as a lone key--I have about 1.8 million total table entries, but less than 50 $NetFanout values, hence I use $NetName to make the key unique.

Thanks,

-fiddler42

Replies are listed 'Best First'.
Re: Managing multi-key hash tables....
by apl (Monsignor) on Apr 27, 2008 at 15:28 UTC
    You could use a Hash of Hashes, so that instead of having one key, you'd have two.

    Some of your code from above would change from:

    $NetNameAndFanout = join (" ",$NetName,$NetFanout); $NetStatsHash{$NetNameAndFanout} = join (" ",$NetCap,$NetRes,$NetLengt +h,$FirstDriver);
    would become
    $NetStatsHash{$NetName}{$NetFanout}{Cap} = $NetCap; $NetStatsHash{$NetName}{$NetFanout}{Length} = $NetLength; $NetStatsHash{$NetName}{$NetFanout}{Driver} = $FirstDriver;

    In that way, instead of

    foreach $KeyNetName (keys %NetStatsHash) { @KeyEntries = split (/\s+/,$MasterKey); $KeyNetName = $KeyEntries[0]; $KeyFanout = $KeyEntries[1]; $OkayToAdd = 0; if ($KeyFanout == $i) { @NetDetails = split (/\s+/,$NetStatsHash{$MasterKey}); $NetCap = $NetDetails[0]; $NetRes = $NetDetails[1]; ### etc.
    you'd say
    foreach $KeyNetName (keys %NetStatsHash) { foreach $KeyFanOut ( keys %{ $NetStatsHash{$KeyNetName} } ) { $OkayToAdd = 0; if ($KeyFanout == $i) { $NetCap = $NetStatsHash{$KeyNetName}{$KeyFanOut}{Cap}; $NetRes = $NetStatsHash{$KeyNetName}{$KeyFanOut}{Length}; ### etc. }
    This is untested, but the principle is there... Good luck, and post if there are any problems or further questions.
Re: Managing multi-key hash tables....
by BrowserUk (Patriarch) on Apr 27, 2008 at 18:05 UTC

    If you really need to use composite keys, then if you order the elements the other way around, you can let perl pick off the fanout number from the front of the string, and just silence the 'not numeric" warnings:

    $h{ int( rand 50 ) . ' '. rndStr( 1+int(rand(20) ), 'A'..'Z' ) } = 'just a placeholder' for 1 ..50;; print "$_ : $h{ $_ }" for sort{ local $^W; $a <=> $b } keys %h;; 0 YLJG : just a placeholder 1 ERHC : just a placeholder 2 UEGHWMGJVWDAUEOJR : just a placeholder 3 TPFZPXPS : just a placeholder 3 VDII : just a placeholder 4 SHHSRQLJVXTOPBYXT : just a placeholder 4 RXWZUUDA : just a placeholder 7 IPSCGIHLBWPR : just a placeholder 9 JRPAZKPXSC : just a placeholder 10 BGE : just a placeholder 11 OI : just a placeholder 11 HYXKTCLHDWCO : just a placeholder 14 QU : just a placeholder 16 ASJOWMLOWWMJWHDRAZ : just a placeholder 17 SQUJMOIM : just a placeholder 18 PFZDZODFMLBCVLTXQF : just a placeholder 18 AYYUCHEUXMNTTVLTJB : just a placeholder 19 KBDNXHCLUHEXSM : just a placeholder 20 EGWOTWSNEBXDISO : just a placeholder 20 QCCXXYEUFMRTYSYDGIBQ : just a placeholder 22 MBGBHAGRDEQXS : just a placeholder 24 IQWSMASGOOKQZC : just a placeholder 27 XRWPIZFKHGNEMTZGEGGB : just a placeholder 27 BHYAWIULV : just a placeholder 28 NNFUODXBGHKGAXKEZJIO : just a placeholder 29 SRQKDZIJVIDDWKQMUEZ : just a placeholder 30 OGFVEODJJGJRTRJ : just a placeholder 32 YMBMILY : just a placeholder 32 GCSQRJHHLJZEX : just a placeholder 33 HNNPVDVXMVPEQTISJ : just a placeholder 34 LCJKOFVJM : just a placeholder 34 HWMLT : just a placeholder 34 CVRFVEYISPLZJN : just a placeholder 36 HUJIAGMCRSSEACCBSES : just a placeholder 37 TLYYNSLVRAMZTXKDMAQC : just a placeholder 38 DAYGYGLMVZMXQTHIJBEV : just a placeholder 40 Q : just a placeholder 40 BNZBSPTUUTQBQMLAD : just a placeholder 41 FDFBWHPUQQRMVV : just a placeholder 42 JKTKYC : just a placeholder 42 LRLOBPONBCPKG : just a placeholder 45 DDBSCVQVKAUP : just a placeholder 45 RNHMIRVLMMYVYRMA : just a placeholder 45 HVETVTKMTUH : just a placeholder 46 EJEQMCIXCMGX : just a placeholder 46 UMIH : just a placeholder 46 GWZM : just a placeholder 46 POOWQOUASNALUX : just a placeholder 49 CZPDKKGAKQZK : just a placeholder 49 UJSMQRNEBQ : just a placeholder

    For your early exit, do the same thing:

    for my $key ( sort{ local $^W; $a <=> $b } keys %h ) { last if do{ local $^W; 0 + $key } >= 20; print "$key : $h{ $key }"; };; 0 YLJG : just a placeholder 1 ERHC : just a placeholder 2 UEGHWMGJVWDAUEOJR : just a placeholder 3 TPFZPXPS : just a placeholder 3 VDII : just a placeholder 4 SHHSRQLJVXTOPBYXT : just a placeholder 4 RXWZUUDA : just a placeholder 7 IPSCGIHLBWPR : just a placeholder 9 JRPAZKPXSC : just a placeholder 10 BGE : just a placeholder 11 OI : just a placeholder 11 HYXKTCLHDWCO : just a placeholder 14 QU : just a placeholder 16 ASJOWMLOWWMJWHDRAZ : just a placeholder 17 SQUJMOIM : just a placeholder 18 PFZDZODFMLBCVLTXQF : just a placeholder 18 AYYUCHEUXMNTTVLTJB : just a placeholder 19 KBDNXHCLUHEXSM : just a placeholder

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Managing multi-key hash tables....
by ivancho (Hermit) on Apr 28, 2008 at 18:57 UTC
    Hash of hashes on 1.8 million entries might be a little heavy on the memory..
    Additionally, you hit yourself for 50x the runtime, since you iterate through all the hash keys for every NetFanout in which you are interested.
    What you need to ask yourself here is whether the hash table is required at all.. A hash table gives you fast random access to entries - you don't use that anywhere. All you need to do is iterate through it all. Plus, your $NetFanout is a small integer number - why hash it at all? - use it to index into an array directly - this way you don't even need to explicitly sort, you get that for free.
    Instead of doing
    $NetNameAndFanout = join (" ",$NetName,$NetFanout); $NetStatsHash{$NetNameAndFanout} = join (" ",$NetCap,$NetRes,$NetLengt +h,$FirstDriver);
    try
    use constant (p_name => 0, p_cap => 1, p_res => 2, p_length => 3, p_driver => 4); push @{$NetStats[ $NetFanout ] ||= []}, [ $NetName, $NetCap, $NetRes, +$NetLength, $FirstDriver, ];
    Note that I use an extra technique to store things in small arrays to save on memory, but enumerate their positions to keep further code readable.
    Now (with some other cleanup) your summary code becomes:
    my $filter = sub { return 0 if ($UseDriverCell and $_[0]->[p_driver] !~ /$DriverPattern +/ ); return 0 if ($UseNetPattern and $_[0]->[p_name] !~ /$NetPattern/ ) +; # If we are here, then either we matched all requirements, or there +weren't any return 1; }; foreach my $i (1..$UpperFanoutLimitOfTable) { my $Count = 0; foreach my $net (grep {$filter->($_)} @{$NetStats[$i]}) { if ($DebugMode) { .. do debugging output .. } $TotalCap[$i] += $net->[p_cap]; $TotalRes[$i] += $net->[p_res]; $TotalLength[$i] += $net->[p_length]; $FanoutCount++; } if ($Count > 0) { $CapAvg[$i] = $TotalCap[$i] / $Count; $ResAvg[$i] = $TotalRes[$i] / $Count; $LengthAvg[$i] = $TotalLength[$i] / $Count; if ($DebugMode) { printf ("CapAvg[$i] = %0.6f\n",$CapAvg[$i]); printf ("ResAvg[$i] = %0.6f\n",$ResAvg[$i]); printf ("LengthAvg[$i] = %0.6f\n",$LengthAvg[$i]); } } else { print ("No nets match input criteria when fanout = $i. Please rev +iew command-line settings--they may be too restrictive!"); } }