in reply to How to create a compact data structure?
Reducing the storage requirement of the records will only get you so far.
Here's an (approximate) table of storage usage(*) for the various options:
| # of records | 1e5 (~MB) |
1e6 (~MB) |
2e6 (~MB) |
~ per record (bytes) |
| record type: | ||||
| HoHs | 25 | 240 | 480 | 250 |
| HoAs | 22 | 210 | 420 | 220 |
| HoPVs (5 bytes) | 15 | 129 | 261 | 136 |
| HoPVs (3 bytes) | 15 | 120 | 245 | 128 |
| HoIVs | 11 | 90 | 180 | 94 |
It was my fault!I apologise for the screwiness of the above table, but it displays just fine until PM gets it's hands on it. PM seems to think that there is an imbalanced <tr> tag somewhere, but it isn't in the html I posted!
As you can see, moving to arrays doesn't save much. Moving to scalars save considerably more, but the type of scalar used is important.
If you attempt to save space by packing the fields, most of the overhead is in the PV itself, so there is little more to be gained from using smaller representations for your fields. Ie. Using a template of 'nc' makes a negligable difference over using 'Nc'.
Even if you decided that you could get away with only storing 3 bits--1 for the flag condition and 2 to represent 'none seen', '1 seen', 'more than 2 seen', you'd still need to store a minimum of 1 byte, and if you do that as a PV, you'd still need ~125 bytes per record.
If you can represent your fields entirely numerically, then avoiding using pack will allow you to use IVs instead of PVs and that yeilds the greatest saving of all. In my example, I used the sign bit of the count to store the 'condition flag'.
Using an HoIVs allows a 65% memory saving compared to a HOHs with marginal loss of performance. It becomes important to not do anything to cause the IVs to be converted to PVs (like printing them out), as this would negate the memory saving. This can be avoided: for example, by copying the scalar to another local scalar and then printing that.
Any greater savings would probably mean avoiding using the top level hash entirely, which can be done in some circumstances but requires knowledge of the nature of the keys. Essentially, can they be hashed in some way that will allow you to index into (say) a large scalar bit field?
The other alternative is to move the hash out of memory, with the performance hit that entails.
Be aware that using Devel::Size for assessing your memory requirements has it's caveats.
It will only report the final size of the Ho*; and not tell you how much extra intermediate strorage was required in order to reach that size. Each time the hash reaches it current allocation limits, a new hash has to be allocated (~double the size of the old one I think), and the data copied over. That means that you often need ~half as much storage again as the final hash requires in order to build that hash.
Theoretically you can do a keys %hash = NNN to avoid some of the reallocation, but I've never worked out how to calculate the value you shoudl set it to.
Also, Devel::Size can use a substantial amount of memory itself in order to calculate the size of a structure, which precludes using external tools (top/Task Manager)to get an accurate picture of the overall memory used.
Test code:
(c&p the appropriate block of code to make your own measurements.)
#! perl -slw use strict; use constant { KEYLEN => 8, ALPHA => [ 'A'..'Z' ], CONDITION_FREQUENCY => 0.2, LIST_SIZE => 1e6, }; our $MAX ||= LIST_SIZE; sub rndStr{ join'', @_[ map{ rand @_ } 1 .. shift ] } sub property { rndStr( KEYLEN, @{+ALPHA} ) } sub condition_is_true{ rand() < CONDITION_FREQUENCY ? '1' : '0' } my %records; foreach my $item ( 1 .. $MAX ) { my $key = property($item); my( $count, $flag ) = unpack 'Nc', $records{ $key }||pack('Nc', 0, + 0 ); ++$count; $flag = 1 if condition_is_true( $item ); $records{ $key } = pack 'Nc', $count, $flag; } printf 'check Memory: '; <STDIN>; __END__ ----------HOHs--------------- my %records; foreach my $item ( 1 .. $MAX ) { my $key = property($item); ++$records{ $key }{ count }; if( condition_is_true( $item ) ) { $records{ $key }{ flag } = 1; } } c:\test>590773 -MAX=1e5 check Memory: 25 MB c:\test>590773 -MAX=1e6 check Memory: 240 MB c:\test>590773 -MAX=2e6 check Memory: 480 MB ----------HoAs-------------- use constant { FLAG => 0, COUNT => 1 }; my %records; foreach my $item ( 1 .. $MAX ) { my $key = property($item); ++$records{ $key }[COUNT]; if( condition_is_true( $item ) ) { $records{ $key }[FLAG] = 1; } } c:\test>590773 -MAX=1e5 check Memory: 22 MB c:\test>590773 -MAX=1e6 check Memory: 210 MB c:\test>590773 -MAX=2e6 check Memory: 420 MB --------HoIVs--------- sub sgn{ ($_[ 0 ]||0 ) < 0 ? -1 : 1 } my %records; foreach my $item ( 1 .. $MAX ) { my $key = property($item); $records{ $key } += sgn( $records{ $key } ); if( condition_is_true( $item ) ) { $records{ $key } = - $records{ $key }; } } c:\test>590773 -MAX=1e5 check Memory: 11 MB c:\test>590773 -MAX=1e6 check Memory: 90 MB c:\test>590773 -MAX=2e6 check Memory: 180 MB --------HoPVs(5 bytes)--------- my %records; foreach my $item ( 1 .. $MAX ) { my $key = property($item); my( $count, $flag ) = unpack 'nc', $records{ $key }||pack('nc', 0, + 0 ); ++$count; $flag = 1 if condition_is_true( $item ); $records{ $key } = pack 'nc', $count, $flag; } c:\test>590773 -MAX=1e5 check Memory: 15 MB c:\test>590773 -MAX=1e6 check Memory: 120 MB c:\test>590773 -MAX=2e6 check Memory: 245 MB --------HoPVs(9 bytes)--------- my %records; foreach my $item ( 1 .. $MAX ) { my $key = property($item); my( $count, $flag ) = unpack 'Nc', $records{ $key }||pack('Nc', 0, + 0 ); ++$count; $flag = 1 if condition_is_true( $item ); $records{ $key } = pack 'Nc', $count, $flag; } c:\test>590773 -MAX=1e5 check Memory: 15 MB c:\test>590773 -MAX=1e6 check Memory: 129 MB c:\test>590773 -MAX=2e6 check Memory: 261 MB
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: How to create a compact data structure?
by almut (Canon) on Dec 20, 2006 at 15:08 UTC |