in reply to System out of memory while storing huge amount of data

Sometimes using a simpler data structure is very effective in conserving memory. For example, creating an array containing 1 million fairly simple hashes:

perl -e" my @a; push @a, { name=>'fred', surname=>'bloggs', age=>'ancient', dob=>'0/0/0000', empno=>1234567890 } for 1..1e6; sleep 10"

uses 673MB of ram on my 64-bit system.

However, storing exactly the same information using strings:

perl -e" my @a; push @a, join( $;, name=>'fred', surname=>'bloggs', age=>'ancient', dob=>'0/0/0000', empno=>1234567890 ), 1..1e6; sleep 10"

Only takes 87MB.

By using strings--which are easily turned back into hashes on a case by case basis as required: my %temp = split $;, $array{ $i ];--rather than hashes, during the data accumulation phase, can often mean that you can store 6 or 7 times as much data in the same memory.

Whilst there may be some performance penalty incurred as a result of building the strings then converting them back to hashes on demand, this is often far less than the performance hit of moving to disk-based storage.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^2: System out of memory while storing huge amount of data
by Marshall (Canon) on Oct 28, 2010 at 15:21 UTC
    I ran your test on my 32 bit machine. The AoH took 411 MB, then I tried an AoA and that wasn't much better, 336 MB. I conclude that the hash overhead in terms of storage isn't as much as one might imagine! Then I ran the string version and as you observe 52 MB - way smaller than either of the above.

    I attribute the difference in MB numbers to greater size of pointers on your 64 bit machine, not to any difference in the benchmark.

      I attribute the difference in MB numbers to greater size of pointers on your 64 bit machine,

      Indeed. 64-bit pointers cost heavily.

      Especially as, (today and for the immediate future), on any machine less than something like $250k, the top 24-bits and usually more will be zeros. Even worse when you consider that the bottom 4-bits will also be 0.

      In an XS module I'm writing, I'm experimenting with storing 64-bit pointers in 32-bit fields. By right-shifting 4 places, I get 32-bit values than can cover 64GB, which is as much memory as I'm going to have in the next say 10 years or so.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.