multidimensional hash emulation vs hash of hashes

dexahex has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: multidimensional hash emulation vs hash of hashes by moritz (Cardinal) on Aug 18, 2011 at 15:30 UTC
Will %the_hash2 work faster for fetch/store operations. You can use Benchmark to find out. There are two main advantages of nested hashes; the first is that there is no ambiguty if the separator character (here _) appears in the keys. The second is that you can easily extract hash refs of data that belong together. For example if you have a subroutine that on one person's data, you can just pass `$the_hash2{client}{fred}` to it, and it has all the data it needs to know about, and no more. In terms of performance I can only guess that emulated hashes typically win by a close margin, but again that should be benchmarked. Perl 6 - second systems done right	[reply] [d/l]
Re: multidimensional hash emulation vs hash of hashes by kennethk (Abbot) on Aug 18, 2011 at 15:36 UTC
First, anytime you are curious about a performance question, you should Benchmark. Nothing gives you truth like experiment. #!/usr/bin/perl -w use strict; use Benchmark qw':all :hireswallclock'; my %the_hash = ( empl_john_id => 13, empl_john_position => 'slave', empl_bob_id => 0, empl_bob_position => 'manager', client_fred_id => 2, client_fred_phone => 12345, client_goldman_id => 0, client_goldman_phone => 666 ); my %the_hash2 = ( empl => { john => { id => 13, position => 'slave' }, bob => { id => 0, position => 'manager' } }, client => { fred => { id => 2, phone => 12345 }, goldman => { id => 0, phone => 666 } } ); cmpthese(106, { 'emulation' => sub {$the_hash{client_goldman_phone}++ }, 'HoH' => sub { $the_hash2{client}{goldman}{phone}++ }, }); [download] yields `(warning: too few iterations for a reliable count) Rate HoH emulation HoH 1883239/s -- -27% emulation 2564103/s 36% --` [download] In a literal sense, multidimensional hash emulation will be faster than using a hash of hashes (at least for this test). This is mostly because there is no need for dereferencing the sub elements (I think). However, this is highly unlikely to be the bottleneck in your** code, and multidimensional hash emulation is considered poor form because it makes code harder to understand, and hence harder to debug and maintain. This technique's usage mainly predates the introduction of proper references into Perl. For a discussion of hash performance, see A short meditation about hash search performance.	[reply] [d/l] [select]
Re^2: multidimensional hash emulation vs hash of hashes by moritz (Cardinal) on Aug 18, 2011 at 15:52 UTC
While certainly a good first step, this benchmark is very one-dimensional (benchmarks usually are :-) For example the choice of data representation also dictates how you process the keys for accessing the information, and the data structure also needs to be build, which might also take a significant amount of time. Perl 6 - second systems done right	[reply]
Re: multidimensional hash emulation vs hash of hashes by blue_cowdawg (Monsignor) on Aug 18, 2011 at 15:33 UTC
Is there a generalized rule on this TIMTOWDI? My own personal feeling on this: I tend to look at hashes the same way I looked at hashes the same way I looked at `structs` in C programming. It's a handy-dandy way of grouping related information together so you can act on that group as a single unit. This is not to be confused with C++ classes in any way. Since hashes are put together on the fly Perl has no way of enforcing all of the elements are present when the hash is created. Let's look at an example of what I'm talking about: `my $company = { employees => { fred => { fullname => "Freddy Freeloader", position => "panhandler", management => "no" }, clem => { realname => "Clem Kaddiddlehopper", job => "questionable", has_reports => "no" } } };` [download] In the first employee record we see the fields fullname, position and manageent and in the second record we see realname, job and has_reports. OK, so which is real and which is Memorex? In spite of that if I am going to group together data without resorting to creating a Perl module I'll use a HoH structure, but just be sure you are keeping track of what you are putting in there. You asked which is faster. I think this is a non-sequitor in this case. If you are using field name such as client_goldman_phone and friends you are burdening yourself as a programmer to coming up with a way to search all of the names, append subfields and all sorts of cruft that gets in the way of good programming. Think maintainability. When I write Perl code my assumption is that someday someone else may have to maintain it. If you are doing crufty things then you are leaving behind crufty code for folks inheriting your code to deal with and your name will be dragged through programming mud. Coding `keys %{$company{employees}}` is easier to write and much easier to read than some sort of fugazy `my @keys = grep "employees", keys %company;` in my humble opinion. HTH Peter L. Berghold -- Unix Professional Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg	[reply] [d/l] [select]
Re: multidimensional hash emulation vs hash of hashes by NetWallah (Canon) on Aug 18, 2011 at 16:52 UTC
If this is to be extendable, to maintain (my) sanity, I would immediately go OO: Create packages (classes) for Employee, Client, and, if necessary, a container class for "Company". IMHO - performance be damned. Computer overhead is negligible compared to maintenance, and coding overhead. "XML is like violence: if it doesn't solve your problem, use more."	[reply]
Re: multidimensional hash emulation vs hash of hashes by locked_user sundialsvc4 (Abbot) on Aug 18, 2011 at 17:08 UTC
Quite frankly, I have begun to appreciate the phrase that, “at 100 billion ops a second, no one can hear you scream.” I think that you should simply focus on writing clear, easily-maintainable code, done in some well-documented fashion which avoids the proliferation of “thousands of statements scattered all over the code, every one of which must be changed in exactly the same way at exactly the same time.” You should give a passing thought to the one-and-only-one thing which might actually slow this program down, namely: page faults. But also carefully consider human effort required to maintain it going forward, weighed properly against the risk associated with inadequate testing or some maintenance oversight.