eXile has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm having a problem with a large perl program using up even larger amounts of memory. I've been trying to make sense of this by using Devel::Size, Devel::Symdump, but from this I learned that the symbol table contains only a small amount of the total memory the process is using (see Devel::Symdump without Devel::Symdump interference, where I learned that the symbol table doesn't contain lexical scopes (broquaint++)).

Since this post I've been trying to use Padwalker to peek into the lexical scopes I'm in; not very succesfully because this seems to 'hang' when I'm using it in perl-d. The other problem with this is that (as far as I can see) it accesses only the scopes below the point-of-action of Padwalker, not all lexical scopes ('scratchpads') that might have variables or other beasts using up memory.

I've been reading up on the perlguts and related literature, and every variable is stored in a structure (like IV, SV, CV and friends), but I haven't encountered a lot of information on the 'superstructure' that keeps these values organised, and how to get information from this 'superstructure'.

Is there a way of breaking up the total memory-usage of a running perl process (or a core-dump) into variables/other managable units? Perl doesn't return memory that is freed after a variable/object etc. is destroyed. Is it possible to view how much of this 'allocated-unused' memory a running process is keeping occupied? A lot of info can be found on CPU-usage statistics (ie. profiling), any pointers to memory-usage related info appreciated. If my view of how perl memory usage is organised is wrong, I'd like to know as well.

Update: Thanks a lot everybody for your replies, I've learned a lot from it. Thanks gmpassos for pointing out B::TerseSize.

Replies are listed 'Best First'.
Re: Memory usage breakup
by Zaxo (Archbishop) on May 01, 2004 at 01:02 UTC

    You could build or obtain a perl built with DEBUGGING on, to make the -D options work. The memory accounting of -Dm also requires that you build for perl's memory allocator rather than system malloc.

    See perlrun for a list of the -D flags.

    It's hard to say without seeing code, but one usual suspect for memory running away in long-running perl programs is a global hash (lexical or not). If data is added to one each cycle and never deleted . . .

    After Compline,
    Zaxo

Re: Memory usage breakup
by perrin (Chancellor) on May 01, 2004 at 02:46 UTC
    Perl doesn't return memory that is freed after a variable/object etc. is destroyed.

    It does if you undef the variable. If you have a long-running program that sucks large amounts of data into a scalar at any point, it's a good idea to undef the scalar when you finish with it (at least it is if you are having memory problems).

      Interestingly, my copy of Perl seems to return about half of the memory when you undef a variable, at least on Linux:
      sub showmem { system("cat /proc/$$/status |grep '^Vm' |sed -e 's/\$/ ($_[0])/'"); print "\n" } showmem("start"); my $x = "A" x 10_000_000; showmem("allocated"); undef $x; showmem("freed");
      produces:
      VmSize:	    2808 kB (start)
      VmLck:	       0 kB (start)
      VmRSS:	    1144 kB (start)
      VmData:	     240 kB (start)
      VmStk:	      28 kB (start)
      VmExe:	     684 kB (start)
      VmLib:	    1528 kB (start)
      
      VmSize:	   22348 kB (allocated)
      VmLck:	       0 kB (allocated)
      VmRSS:	   20700 kB (allocated)
      VmData:	   19780 kB (allocated)
      VmStk:	      28 kB (allocated)
      VmExe:	     684 kB (allocated)
      VmLib:	    1528 kB (allocated)
      
      VmSize:	   12580 kB (freed)
      VmLck:	       0 kB (freed)
      VmRSS:	   10936 kB (freed)
      VmData:	   10012 kB (freed)
      VmStk:	      28 kB (freed)
      VmExe:	     684 kB (freed)
      VmLib:	    1528 kB (freed)
      

        Tying down how much memory a given program will use, and what if any of that memory will be recycled, either internally by perl or back to the OS, is extremely complicated. (As well as highly OS / perl -V / individual perl build depenedant.)

        For example, these 2 one-liners

        P:\test>perl -e" { for( 1 .. 100_000 ) { $x[ $_ ] = ' ' x 1000; $x[ $_ ] = undef; } <STDIN>; } <STDIN>;"

        In this first example, each element of the 100_000 element global array @x is allocated a 1000-byte value, which is then immediately 'freed' by undefing it. At the end of the loop, (the first prompt), 100+MB is allocated to the process. The space for 100_000 elements of 1000-bytes + the overhead for perls array and scalar structures. Even though only 1 element of the array has any space allocated (usable) at any given time.

        P:\test>perl -e" { my @x; for( 1 .. 100_000 ) { $x[ $_ ] = ' ' x 1000; $x[ $_ ] = undef; } <STDIN>; } <STDIN>;"

        The same program, except that the array is now locally scoped. When the first prompt is reached after the loop completes, again, 100+MB is being used, meaning that 99_999 elements of discarded (undef'd) space are lying around unusable and unused. However, once the second prompt is reached, ie. after the local scope in which @x was defined has exited, the memory used by the process (on my system) drops to 12MB.

        With care and motivation, it is possible to force perl to re-use discarded memory, (and even return some of it to the OS under win32), but every attempt I've made to formulate a strategy for doing either, has fallen on stoney ground. I can do it on a case-by-case basis for many apps. I have begun to recognise some cases where I am reasonably sure that I can optimise the memory requirements through fairly simple steps, but inevitably, there are always exceptions to the rules of thumb I use.

        Unfortunately, the exceptions are too common to make the rules of thumb viable for anything other than cases of extreme need.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Actually, to the best of my knowledge it goes back into the pool that perl allocates from. I'm surprised that any of it appears to be freed back to the OS.

        If you'd like to find out more about it, I remember a post on here by elian talking about the internals of this process.

        Update: as pointed out by sgifford, the repeat operator isn't getting constant-folded, so most of this is wrong - it is the pad entry for the subexpression mentioned in the last two paragraphs that is grabbing the memory.
        End update

        The 10MB string in $x is freed by the undef $x, but there is another copy of the string attached to the optree (the compiled code), since constant folding will replace

        my $x = "A" x 10_000_000;
        with
        my $x = "AAAAA..[10_000_000 long]..";

        You can test this by putting the code in an anonymous sub, and then freeing it:

        our $x; my $code = sub { $x = "A" x 10_000_000 }; showmem("start"); &$code; showmem("allocated"); undef $x; showmem("var freed"); undef $code; showmem("code freed");

        Running that here shows VMSize at each step of:

        start: 2896 kB
        allocated: 22436 kB
        var freed: 12668 kB
        code freed: 2900 kB
        

        However, I feel that it is quite rare to have big constants like this in real code, so the simplistic approach of "fold anything that's constant" is still probably the right thing for perl to do.

        Unfortunately you cannot sidestep this just by replacing the constants with variables, since then a different aspect kicks in: perl's compiler assigns an entry on the pad for the results of subexpressions, and this intermediate target then holds the extra copy of the string.

        I'm not sure whether any of the pad-peeking CPAN modules show these unnamed pad entries, but the -DX debugging flag will help at least to verify their existence, eg:

        perl -DXt -we 'my $l = 100; my $a = "A" x $l;'

        Hugo

        Just to chime in with my results. I don't know what version of perl sgifford was using, but I'm running perl 5.8.2 under Debian Linux. My results are:

        My system seems to start with half a meg more RAM used (556K), but the percentage freed seems similar (42.7% for me vs. 43.7% for sgifford).

        Does this imply a memory leak somewhere in perl, or that the garbage collector is caching the memory allocated for re-allocation later?

      Even if I use 'my' on variables ?

      Solli Moreira Honorio
      Sao Paulo - Brazil
      If I use 'my' on variable ?

      Solli Moreira Honorio
      Sao Paulo - Brazil
        Yes, even then.
Re: Memory usage breakup
by ysth (Canon) on May 01, 2004 at 01:27 UTC
    One way or another everything is referenced by a global or per interpreter or per thread C variable starting with PL_. The B module provides access to some of these. But working out total memory used that way is likely to be too hard.

    Devel::Peek provides some high level memory statistics (which I have never myself used). I think they are only available for a perl compiled with perl's malloc routines or with a special option set.

Re: Memory usage breakup
by gmpassos (Priest) on May 01, 2004 at 18:11 UTC
    Perl will free the memory used by SV when they are cleaned, folks!

    I have worked a lot in that to make a function that clean a package from the memory, and it works! You can see the work at Safe::World, where each Safe::World compartment has it's own modules loaded, where I can clean one of this compartments and unload all the enverioment.

    I know that it works because after 1000 times running the same script, cleanning the compartments I get 18Mb used by the process, and if I don't clean it I get 150Mb of use.

    Actually Perl will recycle cleaned SV, so, when you create a SV you can't guarantee that it's adress wasn't used before. This I saw when I build the module Hash::NoRef, where the objective of this module is to store objects without mess with the DESTROY system, without increment the reference to this objects. A strange behavior in the 1st version was when we store an object and it's DESTROIED, than another object is created with a SV that has the same adress, what will puts the new object in the place of the old object that was cleaned, so, I will have an object stored in the table that I don't really have stored. But now this was fixed with weak references, and now everything is fine.

    Don't forget that the memory management of Perl is one of the bests that exists!

    If you want to know what is in the memory and how it's used, take a look in this funtion, it will scan from a package and tell the size of each stuff:

    sub package_size_report { my $package = shift ; eval('require B::TerseSize') ; if ( $@ ) { return "You need to have installed B::TerseSize (from B: +:Size) to use &package_size!" ;} my $output ; $output .= "Memory Usage for package $package\n\n" ; my($subs, $opcount, $opsize) = B::TerseSize::package_size($package); $output .= "Totals: $opsize bytes | $opcount OPs\n\n" ; my($clen, $slen, $nlen); my @keys = map { $nlen = length > $nlen ? length : $nlen; $_; } ( sort { $subs->{$b}->{size} <=> $subs->{$a}->{size} +} keys %$subs ); $clen = length $subs->{$keys[0]}->{count} ; $slen = length $subs->{$keys[0]}->{size} ; for my $name (@keys) { my $stats = $subs->{$name}; if ($name =~ /^my /) { $output .= sprintf("%-${nlen}s %${slen}d bytes\n", $name, $stats +->{size}) ; } else { $output .= sprintf("%-${nlen}s %${slen}d bytes | %${clen}d OPs\n +", $name, $stats->{size}, $stats->{count} ) ; } } return $output ; }

    Graciliano M. P.
    "Creativity is the expression of the liberty".