jaa has asked for the wisdom of the Perl Monks concerning the following question:

Is there an easy way to find our where all the memory is being used in my process?

For example something that can tell me how many scalars, and estimate their data size + Perl overheads on a per-package basis in bytes?

Or even better something that can take a nested data structure and summarise how much data + overhead (hash buckets, key arrays?) is being used at a specified depth?

Or code analysis tools that let me know about long lived large data structures that are not referenced again?

Does releasing a hash make the recovered memory available to subsequent Perl structures?

Are there any tips on efficient memory usage? I am not programming in a web environment, but rather a large commercial data processing environment. Current processes sometimes suck up to 1.5G RAM when performing set operations. They run ok, but we would like to increase our capacity to process larger sets.

Seeking wisdom, and offering thanks in advance,

Jeff

Replies are listed 'Best First'.
Re: How do I dumping Perls memory usage?
by broquaint (Abbot) on Feb 24, 2003 at 12:02 UTC
    For example something that can tell me how many scalars, and estimate their data size + Perl overheads on a per-package basis in bytes?
    You could put Elian's Devel::Size to good use. Something like this might be of help
    use strict; use Devel::Size qw(size total_size); sub mem_size { my($pkg, $type) = @_; die("mem_size(): unknown variable type '$type'\n") if $type !~ /^(?: HASH | ARRAY | SCALAR )\z/x; my $ret = 0; no strict 'refs'; for(values %{"${pkg}::"}) { $ret += ( $type eq 'SCALAR' ? size( ${ *$_{$type} } ) : total_size( *$_{$type} ) ) if defined *$_{$type}; } return $ret; } print "SCALAR mem usage in main:: is ", mem_size("main", "SCALAR"), $/; __output__ SCALAR mem usage in main:: is 2098
    Or even better something that can take a nested data structure and summarise how much data + overhead (hash buckets, key arrays?) is being used at a specified depth?
    Again you could use Devel::Size for this task by employing total_size() at the desired depth. Also a hash in scalar context will return the number of used buckets and the number of allocated buckets (see. perldata).
    Or code analysis tools that let me know about long lived large data structures that are not referenced again?
    Data structures that are no longer referenced will be garbage-collected so you needn't worry about that.
    Does releasing a hash make the recovered memory available to subsequent Perl structures?
    That it does. Once a variable has been 'released' it's memory is then available to future variables so perl won't just keep on growing.
    HTH

    _________
    broquaint

Re: How do I dumping Perls memory usage?
by BrowserUk (Patriarch) on Feb 24, 2003 at 13:40 UTC

    Are there any tips on efficient memory usage?

    A few things to check for.

    • If your passing hashes around between subs, make sure your passing them by reference not by by value.

      sub foo{ my ($hashref) = @_; for my $key (keys %{$hashref}) { do stuff; } } my %hash = (Gobs=>'many', of=>'loads', data=>'Huge'); ... foo( \%hash);

      Instead of

      sub foo{ my (%hash) = @_; #! Wrong! for my $key (keys %hash) { do stuff } } ... foo( %hash ); #! Wrong!

      The second method will effectiviely double the memory usage for the hash. Go a third level deep that way and you triple it.

    • Same applies to arrays.
    • When processing your hashes, use

      while( my ($key, $value)= each %hash) { do stuff; }

      rather than for my $key (keys %hash) { do stuff; }

      On a hash containing 500,000 4-char keys, the latter method will consume an extra 20MB to build the list, the former essentially causes no growth.

    • You mentioned "set operations". If you don't need to keep a copy of the original hashes...

      When doing unions, add the elements from the smaller of the two, to the larger rather than combining them into a third hash.

      Similarly intersections. Delete the elements one set from the other set rather than building a third.

      exists $b{$k} or delete $a{$k} while ($k,$v) = each %a;
      rather than
      exists $b{$k} and $c{$k}=$v while ($k,$v) = each %a

      If you do have to build a third hash to contain the results of such an operation and you discard it after processing, declare the temporary hash at as close a local scope as possible to ensure timely release.

    • If your creating and destroying lots of temporary data structures, it might be worth using undef to prompt their timely destruction. I haven't found any way of verifying this yet--maybe someone else knows if this is no longer (or was ever) a useful thing to do.
    • If your code is using all global vars, or if as in one or two cases I have seen here, the variables are being my'd in one huge lump at the top of the program or module, you should move as many of the declarations as close to their use as possible. Wrapping linear code in bareblocks and declaring the my's for temporary vars inside can reduce the footprint.

    ..and remember there are a lot of things monks are supposed to be but lazy is not one of them

    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: How do I dumping Perls memory usage?
by tachyon (Chancellor) on Feb 24, 2003 at 12:04 UTC

    Are there any tips on efficient memory usage?

    Perl is a compromise. Whenever there has been a tradeoff between speed and memory the usual response has been to use more memory.

    Depending on your application the obvious solution is to leave as much data on disk as possible. Tie::Hash is one possible solution as is using a database for the data and using the inbuilt database functionality to do as much pre-processing as possible.

    Although this is pretty basic stuff I will just mention it is much more memory efficient to do stuff like this:

    while(<FILE>) { do stuff } # rather than @data = <FILE>; for (@data) { do stuff } # pass data to and from subs as a reference (effectively a pointer) # this saves making duplicate copies of data structures my $data_ref = process_data(\@data); sub process_data { my $ref = shift; for (@$ref) { do stuff } return $ref; }

    Perl also has the -i inplace edit fuction which may be useful. There may also be value in actively undefing a data structure so it can be garbage collected if your code has finished with it. The Devel:: range of modules provide lots of insight into speed/memory size in a program. If you are using grep (especially in scalar context) you should be aware that it will build a complete temp array and iterate through the entire data set...

    # this is short but memory intensive do_stuff() if grep { /something/ } @array # compared to this which is faster and uses less memory # but takes 4 lines to write instead of one.... for(@array) { next unless /something/; do_stuff(); last; }

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: How do I dumping Perls memory usage?
by davorg (Chancellor) on Feb 24, 2003 at 11:41 UTC