Re: Array of Hash interface to an Array of Array?

Every data structure in Perl, whether it's a scalar, an array or a hash takes some memory overhead (in addition to the memory needed to store the data itself). For a scalar, this is about 24 bytes. For hashes and arrays, this is even 2 or 3 times more. And note that if you have an array with 10 elements, you pay overhead for one array and 10 scalars.

In Perl. it's a lot cheaper to have 2 arrays with 10000 elements each, than to have 10000 arrays with 2 elements each. So, if you have lots and lots of data, you should give it some thought on how you are going to organize it. Arrays take less memory than hashes, but I it's probably more worthwhile to reduce the number of aggregates than to replace hashes with pseudo-hashes (which are dead anyway).

You also might want to look into the Devel::Size CPAN module.

Abigail

Comment on Re: Array of Hash interface to an Array of Array?

Replies are listed 'Best First'.
Re: Re: Array of Hash interface to an Array of Array? by jaa (Friar) on Sep 01, 2003 at 14:37 UTC
Some quick testing by simply changing from Array of Hash to Array of Array, and using Top to compare total process mem: Input data: 40MB Array of 67,000 Hash: 350MB Array of 67,000 Array: 190MB So there appears to be a reasonable memory saving to be had. My biggest production datasets are 7 to 10 times larger - what process memory usage this will translate into is hard to anticipate, and difficult to test at the moment, but I would make working assumptions of 2G vs 3G - and I would rather the 160M - 1G was used for the DB cache than the Perl Hash. Regards, Jeff	[reply]

Replies are listed 'Best First'.

Re: Re: Array of Hash interface to an Array of Array?
by jaa (Friar) on Sep 01, 2003 at 14:37 UTC

Input data: 40MB

Array of 67,000 Hash: 350MB

Array of 67,000 Array: 190MB

So there appears to be a reasonable memory saving to be had. My biggest production datasets are 7 to 10 times larger - what process memory usage this will translate into is hard to anticipate, and difficult to test at the moment, but I would make working assumptions of 2G vs 3G - and I would rather the 160M - 1G was used for the DB cache than the Perl Hash.

Regards,

Jeff

[reply]