Array of Hash interface to an Array of Array?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

My question is related to memory requirements for a large number of hashes.

I DP reasonably large arrays of identically structured records. For naming convenience, I use a hash for each record with the field names as the hash keys

For example:

  my $people = [
    { name => 'fred', age => 25, height => 1.5 },
    { name => 'sally', age => 20, height => 1.4 },
  ];
  for my $person (@$people} {
    print $person->{name},"\n";
  }
[download]

The record structures are much larger, with typically 80-100 fields and about 4K of data. When dealing with datasets where the data is say 60MB, I am seeing memory usage of 600MB - I believe that this is related to the hash algorith and how many buckets are used etc.

I have searched CPAN, but not found anything - I was wondering if there are any magic Tie modules out there that let me pretend that I am working with an Array of Hash, but that are implemented under the hood as an Array of Array - where the fieldname gets efficiently mapped to an index?

Something like:

  my $people = [
    [ 'fred',  25,    1.5 ],
    [ 'sally', 20,    1.4 ],
  ];

  my $accessor = accessor->new(
     keys => ['name','age','height'],
     data => $people,
   );

  for my $person (@$people} {
    print $person->{name},"\n";
  }
[download]

Thanks in advance,

Jeff

Comment on Array of Hash interface to an Array of Array? Select or Download Code

Replies are listed 'Best First'.

Re: Array of Hash interface to an Array of Array?
by PodMaster (Abbot) on Sep 01, 2003 at 14:00 UTC

use constant name   => 0;
use constant age    => 1;
use constant height => 2;

my $people = [
    [ 'fred',  25,    1.5 ],
    [ 'sally', 20,    1.4 ],
  ];

for my $person( @$people ) {
   print $person->[name],"\n";
}
__END__
[download]

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

Re: Re: Array of Hash interface to an Array of Array?

by jaa (Friar) on Sep 01, 2003 at 14:52 UTC

I like 2POE too!

Unfortunately consts dont quite hit the spot in this case - for example I have several arrays, with 'name' columns at different offsets, and the AofH and H's get passed around a number of modules, so there are scoping issues.

I was wondering more about some sort of class that I could overlay on an AoA to make the thingies pretend they were Hashes even though there would only be one hash mapping the names to an array offset...

Maybe there is a clue in Abigail's posting... off to google pseudo-hash...

Regards

Jeff

Re: Array of Hash interface to an Array of Array?
by Abigail-II (Bishop) on Sep 01, 2003 at 14:16 UTC

In Perl. it's a lot cheaper to have 2 arrays with 10000 elements each, than to have 10000 arrays with 2 elements each. So, if you have lots and lots of data, you should give it some thought on how you are going to organize it. Arrays take less memory than hashes, but I it's probably more worthwhile to reduce the number of aggregates than to replace hashes with pseudo-hashes (which are dead anyway).

You also might want to look into the Devel::Size CPAN module.

Abigail

Re: Re: Array of Hash interface to an Array of Array?

by jaa (Friar) on Sep 01, 2003 at 14:37 UTC

Input data: 40MB

Array of 67,000 Hash: 350MB

Array of 67,000 Array: 190MB

So there appears to be a reasonable memory saving to be had. My biggest production datasets are 7 to 10 times larger - what process memory usage this will translate into is hard to anticipate, and difficult to test at the moment, but I would make working assumptions of 2G vs 3G - and I would rather the 160M - 1G was used for the DB cache than the Perl Hash.

Regards,

Jeff

Re: Array of Hash interface to an Array of Array?
by NetWallah (Canon) on Sep 01, 2003 at 13:59 UTC

I suggest that you leverage the database methods, particularly DBI to manage accessing and storing this information. This will make your code easier to read, and a lot more scalable.

Re: Re: Array of Hash interface to an Array of Array?

by jaa (Friar) on Sep 01, 2003 at 14:18 UTC

Unfortunately the DP in this instance cannot be expressed in SQL - without going into detail, it is complex and iterative.

Regards,

Jeff

PS My favourite SQL shortcoming is:

  select 
    name, sum(value) as value
  from mytable
  group by name
  order by abs(value) desc
[download]

I only know of a single SQL engine that will perform this query correctly

Re:x3 Array of Hash interface to an Array of Array? (grouping in SQL)

by grinder (Bishop) on Sep 01, 2003 at 14:48 UTC

My favourite SQL shortcoming is:

OK, at the risk of making a fool of myself, I'll bite.

I believe that that statement is syntactically incorrect. If it works in one engine, I would say it is broken. (Which one are we talking about, by the way)?

When you say sum(value) as value, in your order by clause, you can't use value (i.e. the right hand side of the selected item clause) as the aggregator. You must use sum(value) (the left hand side of the selected item clause).

I just whipped up a sample table in Postgresql. It chokes on your example, but renders the correct results on mine. Consider:

select * from foo;
 name | value 
------+-------
 foo  |   -10
 foo  |    -6
 foo  |    12
 foo  |    34
 bar  |     6
 bar  |    23
 bar  |    76
 rat  |    -6
 rat  |   -60
(9 rows)
[download]

and the result:

select name, sum(value) as value from foo group by name order by abs(s
+um(value)) desc;
 name | value 
------+-------
 bar  |   105
 rat  |   -66
 foo  |    30
(3 rows)
[download]

Looks good to me. Now, it may well be that you can use what I term "the right hand side" in group by, but I think I'll continue using the left hand side. Even if it is clunky and redundantly verbose, at least I know what's going on.

And I have been known to write Perl scripts that generate selected aggregates paired up with corresponding group by clauses just to ensure that it's all correct and avoid breaking the DRY principle.

[reply]
[d/l]
[select]

Re: Re:x3 Array of Hash interface to an Array of Array? (grouping in SQL)

by jaa (Friar) on Sep 01, 2003 at 15:31 UTC

Re: Re: Re:x3 Array of Hash interface to an Array of Array? (grouping in SQL)

by mildside (Friar) on Sep 02, 2003 at 03:51 UTC

Some notes below your chosen depth have not been shown here

Re: Array of Hash interface to an Array of Array?
by DrHyde (Prior) on Sep 01, 2003 at 15:14 UTC

"WARNING: This section describes an experimental feature. Details may change without notice in future versions."

However, you probably don't want to do that either. I think you should look at tie()ing the hash to a dbm file. I prefer using GDBM_File in most situations, but you should look at the handy comparison chart in the AnyDBM_File docs before choosing a dbm implementation.

In combination with MLDBM you should be able to work with your data, with a remarkably low memory requirement.

Re: Re: Array of Hash interface to an Array of Array?

by jaa (Friar) on Sep 01, 2003 at 16:41 UTC

Thanks for the pointers...

Pseudo-hashes only sound like half of what I want - is an array of 350,000 PHs really much more efficient than an array of 350,000 Hashes? Abigail alluded to more efficient use of memory arising from reducing the number of structures.

I think that what I really want is a single something that stores data efficiently - eg an AofA or even an array of packed strings, or even a single huge string, or chunked pieces of string... possibly qubit encoded on chromium atoms for compactness...(slaps head twice, breathes deeply, zen-like calm returns...)

...that provides a transparent Array like interface for accesing collection elements and a Hash like interface to access properties of individual elements.... mmmm, Yet Another Database And Yes Another Database Approach (YADAYADA) is about to be born?... not 8-)

I can see how tieing this into a flat-file db could further save memory, I already use Tie::File arrays to process incoming flat-files.

scurries off to check it all out...

Thanks again to all,

Jeff

Back to Seekers of Perl Wisdom