in reply to How to Calculate Memory Needs?

Thanks for all of your suggestions and pointers. In the interest of wrapping things up, and sharing what I learned in the process, here's what I did:
1. Read up on GTop.
The README on CPAN says,
this package is a Perl interface to libgtop: http://home-of-linux.org/gnome/libgtop/
(The link doesn't point anywhere, BTW). But my platform choices are Windows 2000 or Mac OS X - neither of which has gtop or libgtop on it. So, to use this module, I'd have to find a gtop port, or find a Linux box. Either of those are certainly possible, but far more effort than I'd hoped to expend. The link joealba provided has great usage samples for GTop for anyone interested. I was all ready to install it! :-(
Next option...
2. PERL_DEBUG_MSTATS
My Perl was not compiled with support for this.
Next option...
3. Store the hash in a file
This would also require re-designing the script. Until I have some idea that the limitations of doing it all in RAM are too restrictive, I'd rather not go through that work. Definitely my next choice if this doesn't scale well.
Next option...
4. If I can't use GTop, what could I use instead...
Add sleep 10; at the beginning of the script, and again at the end. This gives me time to look at the memory used by perl.exe in the Win 2000 task list to see how much space the program takes before it builds any data structures at the beginning, and after it's built them all - and while they're still in scope - at the end. Then, I can take Fletch's advice and 'subtract.'
This did the trick.
The difference in memory taken up by perl.exe at the beginning of the script and after building the hash based on 160 users and 1600 books is 228K. Seems pretty teeny. At the moment, I have 61Meg free physical RAM. If this scales linearly (and I believe it does) that allows me to do about 42,500 users before even going to swap (theoretically). So this will scale just fine for the volumes I'm likely to encounter.

So, I ended up going with a combination of Fletch and joealba's look at how much memory you're already using and do the arithmetic approach and perrin's Don't worry about precision, just play with it and be happy with a rough idea approach.

Many thanks for all of your suggestions. This was very helpful.
-- mrbbking

Replies are listed 'Best First'.
Re: Re: How to Calculate Memory Needs?
by joealba (Hermit) on Apr 06, 2002 at 05:04 UTC
    228,000 bytes / 1600 books = 142.5 bytes used per book on average for storage in the hash.

    Subtract from 142.5 the average number of characters in the user ID AND the number of characters in an ISBN number. Then, you've got a reasonable estimate of the number of bytes of overhead that are used for each element in your hash.

    Then, you can make some good guesses about the upper memory bound on your program!
Re: Re: How to Calculate Memory Needs?
by marcos (Scribe) on Jun 18, 2002 at 09:15 UTC
    I'm a bit late in this thread, but I have a similar problem: evaluate the size of a big hash that I have in memory.
    I searched perlmonks for some clue on this issue, and I found this thread. I've read your ideas, and I found them very interesting. Anyway I came up with a different approach, and I'd like to share my idea with you so that you can give me feedback.
    The idea is quite simple: I build the hash and then use Storable::freeze (see module Storable) to freeze it in memory, and then evaluate the length of it. I had this idea reading the man page of Storable module: here is an example from the man page:
    use Storable qw(store retrieve freeze thaw dclone); %color = ('Blue' => 0.1, 'Red' => 0.8, 'Black' => 0, 'White' => 1); $str = freeze(\%color); printf "Serialization of %%color is %d bytes long.\n", length($str);

    What do you think of this approach?

    marcos
      I don't know either way, but I would think there is differences in the size of the in memory data structures and the stored versions. Even so, it would probably give you a good ball park figure though you would use more memory as you have both structures in memory.

      -Lee

      "To be civilized is to deny one's nature."
Re: Re: How to Calculate Memory Needs?
by shotgunefx (Parson) on Jun 18, 2002 at 09:35 UTC
    If you are accessing the data serially (and even if you're not) using a tied hash with MLDBM would make a good choice. The modifications would probably be simple. The biggest issue being you can't directly assign to sub elements of a complex data structure.
    $MLDBM_Hash{$user}{$subkey} = $value; # Won't work. my $hash = $MLDBM_Hash{$user}; # Will work. $hash->{$subkey} = $value; # Modify $MLDBM_Hash{$user} = $hash; # Reassign


    -Lee

    "To be civilized is to deny one's nature."