You could take a look at Devel::Size, Devel::Size::Report, Devel::Peek, Devel::FindGlobals and, if you can build your own copy of Perl for the target platforms, Devel::LeakTrace.

That said, to be able to generate a 1.7GB footprint from a script that processes a couple of 10 MB files requires that you are either building a lot of duplicate datastructures (hashes or arrays)--this is quite easy to do with several of the Graph::* and Set::* type modules; many of which are quite profligate with memory.

This is especially true if your building Graphs that result in self referencing trees and the like and don't have code to explicitely break circular references as these will prevent garbage collection.

The other way to chew up large volumes of ram unnecessarially, is to pass lots of bigs lists around between subroutines, rather than array references. The classic example is something like:

This consumes 390MB to process 10 MB of data:

#! perl -slw use strict; sub clean { my( @dirty ) = @_; my @clean; for ( @dirty ) { s[\s*(\S+)\s*][$1]; push @clean, $_; } return @clean; } sub double { my( @data ) = @_; my @doubled; for ( @data ) { push @doubled, $_ *2; } return @doubled; } open FILE, '<', 'data\1millionlines.dat' or die $!; my @data = <FILE>; my @cleaned = clean( @data ); my @doubled = double @cleaned; print for @doubled; printf 'Check mem.'; <STDIN>;

Whereas this does the same processing and in 170 MB by using references and side-effects to avoiding duplicating large lists

#! perl -slw use strict; sub clean { my( $dirty ) = @_; s[\s*(\S+)\s*][$1] for @$dirty; } sub double { my( $data ) = @_; $_ *= 2 for @$data; } open FILE, '<', 'data\1millionlines.dat' or die $!; my @data = <FILE>; clean( \@data ); double( \@data ); print for @data; printf 'Check mem.'; <STDIN>;

And this does the same using less than 2 MB by avoiding building large lists in the first place:

perl -nlwe" m[s*(\S+)\s*] and print $1*2" data\1millionlines.dat

It's a contrived example, but it illustrates the points.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

In reply to Re: Extensive memory usage by BrowserUk
in thread Extensive memory usage by TheMarty

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.