My scalars swell

kaatunut has asked for the wisdom of the Perl Monks concerning the following question:

Recently I decided to recode a C project of mine in perl. Though I was pleased to have another chance to code in my favourite language, not far did I get before I ran into problem... a big problem. Since the code is currently pretty small, instead of explanations I'll paste it:

sub readPlaque {
    my($opt,@files)=@_;
    local *FILE;
    
    for my $fn (@files) {
        open(FILE,$fn) or (!$$opt{quiet} and print STDERR "couldn't op
+en $fn\n"),next;
        my %plaq;
        chomp($plaq{time}=<FILE>);
        $plaq{plr}={};  
        $plaq{pos}={};  
        while (<FILE>) {
            chomp;
            #     rank  name  level exp  created age
            if (/^(\d+) (\w+) (\d+) (\d+) (\d+) (\d+)$/) {
                $plaq{plr}{$2}=$plaq{pos}{$1}=[$1,$2,$3,$4,$5,$6]; #{ 
+match([qw(rank name level exp created age)],[$1,$2,$3,$4,$5,$6]) };
                $c++;
            } else {
                print "syntax error in $fn line $. '$_'\n";
            }
        }
        close(FILE);
        chomp($ts=ctime($plaq{time}));
        print "Read plaque ",scalar(@plaque),": $fn ($ts)\n";
        push @plaque,{ %plaq };
    }   
}
[download]

This function is called on 85 files, and $c is 43676 afterwards. According to top, the memory consumption difference between before and after is about 22 megabytes. Since @plaque is the only structure that survives function exit, and never does the function allocate huge amounts of memory and then release it, I'd have to suppose it's @plaque with its fluff and 43767 pieces of actual data that eats this 22 megabytes, which would yield about 500 bytes per element (of 6 scalars). This seems a tad... excessive, especially concerning the data in file has known boundaries.

What can I do to limit the memory usage? I'll be accessing the data, whole of it, frequently so I really, really would like to avoid going for Storable/FreezeThaw with its constant slow and cumbersome freezes and thaws.

P.S. The commented part with 'match' concerns a related issue. For convenience I'd really like to access elements by names like rank, exp and created instead of indexes, but that'd eat even more memory. 'match' is inverse function to keys+values, reconstructing a hash from them. I suppose I could make each plaque an object and access them with subs returning a reference, but that'd screw operator precedence up and look ugly...

Comment on My scalars swell Download Code

Replies are listed 'Best First'.
Re: My scalars swell by Fletch (Bishop) on Oct 20, 2001 at 18:46 UTC
Whenever you're suspecting memory leaks, it's very helpful to mention what version of perl that you're using. Also make sure that you're using `use strict`. You haven't lexically scoped `$ts`, not that that'd cause bloat this bad. Also a minor style nit. Since you've already declared `%plaq` lexically you clould just push a reference to it rather than rebuilding a new hashref just to push. `... push @plaque, \%plaq;` [download]	[reply] [d/l]
Re: Re: My scalars swell by kaatunut (Scribe) on Oct 20, 2001 at 19:41 UTC
Ah, yes, I wasn't suspecting memory leak actually, I figured this was normal perl operation. Did strict, it had no complaints (beyond a couple of 'our'ed globals). Perl version is 5.6.0 straight from RH7 RPM.	[reply]
Re: My scalars swell by clintp (Curate) on Oct 20, 2001 at 19:58 UTC
@plaque is NOT the only structure that survives the function exit. $c and $ts do. But that's not important, I don't think. Unless you've stumbled on one of the myriad of 5.6.0 memory leaks, I'd say that you're looking at about the right memory usage. You say each "element" eats 500 bytes. Each element here is a hashref with keys and values...wait...wait...each element is a hash-of-hashes with keys and values. So you have the overhead of each of those HoH's plus the keys plus the values. I'd say that 500 bytes per element of @plaque is reasonable. Very reasonable.	[reply]
Re: Re: My scalars swell by kaatunut (Scribe) on Oct 21, 2001 at 20:08 UTC
@plaque is a list of 86 hashes, each of which has 3 keys (3 scalars) associated to three scalars (two being refs), both refs referring to a hash of about 507 keys (507 scalars), each associated to a scalar which is a reference to a list of 6 scalars. Therefore, 86 * (3 + 507 + 5076) = 305472 scalars. Memory usage after forced numification 12444 kilobytes, yielding about 40 bytes per scalar (ignoring the overhead of lists and hashes themselves which I cannot measure due to lack of knowledge), which sounds pretty much the same as much simpler tests yield. But the question remains... 40 bytes per scalar?!* I hardly need 4 bytes per scalar... Ah well, guess I'll have to either forget it or go play with vec and substr. Which is not nice nor clean solution. I wonder if perl6 is going to do something about this.	[reply]
Re (tilly) 3: My scalars swell by tilly (Archbishop) on Oct 22, 2001 at 00:34 UTC
While the space usage by Perl may astound you, when you start accounting for it, it turns out to be much more reasonable than you might think. For instance a basic scalar value has to have a pointer to a data structure, keep a reference count, and keep a bunch of flags to know what it currently is. That is 12 bytes of overhead for general behaviour related metadata, and we don't even have the data yet! Plus the things you are discounting, arrays and hashes and things, are all non-trivial data structures which involve lots of associated metadata, and then wasted space for internal buffering. For more on this, you should take a look at Perl 5 Internals by Simon Cozens. And after that, dive into perlguts. Perl isn't C. But the nice features don't come for free.	[reply]
Re: My scalars swell by jackdied (Monk) on Oct 20, 2001 at 21:12 UTC
Try coercing the numbers to real numbers while storing them. `[0+$1,$2,0+$3,0+$4,0+$5,0+$6]` [download] I've noticed (in perl 5 point something) that this reduces memory consumption by a noticable amount. It won't save your skin, but it will shave a bit off the top. In the below brief test, the pure numbers version takes up 5656k, the stringified takes up 9560k. `#!/usr/bin/perl use strict; my @foo = (0..100000); foreach my $f (@foo) { # comment/uncomment the next line to change behaviors $f = '' . $f; } sleep(10); exit(0);` [download] -jackdied	[reply] [d/l] [select]
Re: Re: My scalars swell by clintp (Curate) on Oct 20, 2001 at 21:52 UTC
I'd like to alter your test just a little bit to prove a point. Trying to code to the implementation of a language just isn't wise. Observe: `my @foo = (0..100000); my @t; foreach my $f (@foo) { # comment/uncomment the next lines to change behaviors $f.=""; push(@t, $f+0); } system("ps auxwwww \| grep perl"); sleep(10); exit(0);` [download] The problem is that the memory saved by trying coerce the scalar back to a number to save memory doesn't always work. For example, if the scalar has been used in a string before you don't get the memory back: With "push(@t, $f+0)" only: Process size: 12716 With "push(@t, $f."")" only: process size: 15432 With "$f.=""; push(@t, $f+0)" : process size 14276 In this case (I believe) that perl remembers having done an number->string conversion and caches the value to prevent having to do it again (or have I got that backwards..?). At any rate, trying to outsmart the interpreter doesn't always work and starts awful cargo-cult beliefs.	[reply] [d/l]
Re: Re: Re: My scalars swell by jackdied (Monk) on Oct 21, 2001 at 01:11 UTC
I agree it is bordering on black magic, but I've used it to good effect to squeeze the last ounces of mem from my computer. Does anyone else do the Dr Dobbs puzzle every month? I tried doing a super search and a google search, but found nothing. Is there a pragma or hint that you can pass to perl to tell it you are just using really plain integer scalars?	[reply]
Reduce the number of structures by jackdied (Monk) on Oct 21, 2001 at 01:01 UTC
Non-ideal for maintenance, but if you just want to reduce the size of the running program you could cut out the individual arrays for each of the players. Create one large array and only store an index into it. A little helper sub to retrieve the six values based on name or ranking would be good for readability. This will remove the overhead of the array ref for each player. The only comments I could find on pre-allocation of perl arrays was from 1998 (google groups search). Your array is small enough (100k-200k items) that I don't think it matters alot, but you could try `my @big_array; $#big_array = $ESTIMATED_FINAL_SIZE; # declared at top of program # do stuff here` [download] -jackdied	[reply] [d/l]