http://qs1969.pair.com?node_id=794209

amir_e_a has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am using Parse::MediaWikiDump to analyze a static XML dump of Wikipedia. I am reading every page and saving some interesting info about it to a text file. (If it's relevant, i'm running Cygwin Perl 5.10 on Windows XP.)

The memory usage of this program keeps growing quite quickly as i am progressing through the dump, even though i believe that i am not aggregating any info in variables - only in files. Of course, i might be wrong - it is possible that i am aggregating something without noticing. And maybe Perl's garbage collector isn't doing its job. And maybe some internal variable in Parse::MediaWikiDump is aggregating data.

I can start sprinkling Devel::Size::size() calls around the code, but that would be rather annoying, because if i understand correctly it only works per variable, which means that i'll have to write such a line for every variable, and i have a lot of them, not to mention the variables in the external module.

Is there any convenient tool which can produce a detailed list of all the memory usage of a Perl program during runtime?

I also tried periodically checking the value of Devel::Leak::NoteSV(), and it indeed keeps growing, but i don't really what can i do with it.

Thanks in advance for any help.

  • Comment on how much memory each Perl variable uses

Replies are listed 'Best First'.
Re: how much memory each Perl variable uses
by BrowserUk (Patriarch) on Sep 08, 2009 at 20:24 UTC
    I can start sprinkling Devel::Size::size() calls around the code, but that would be rather annoying,

    If you run Devel::Size against %:: in the outermost loop of your program and then compare the output from one run to the next, it can help you isolate where the memory growth is occuring, you can then look more closely at the relevant area(s). (BTW: You'll need v0.72; 0.71 will blow up in mysterious way if you try this.)

    use Devel::Size qw[ total_size];; printf "%30s: %d\n", $_, total_size( $::{ $_ } ) for keys %::;; _<C:/Perl64/site/lib/auto/Cwd/Cwd.dll: 499 version::: 13009 /: 490 stderr: 303 SIG: 5964 ,: 509 Tie::: 7454 utf8::: 5741 ": 405 constant::: 28942 re::: 44435 DynaLoader::: 63651 mro::: 5947 Devel::: 32202 Cwd::: 126954 strict::: 9683 stdout: 303 &#8597;: 274 |: 500 Regexp::: 1266 Term::: 1096 _code: 493 UNIVERSAL::: 3012 overload::: 59362 $: 293 time: 994 File::: 82769 &#8597;E_TRIE_MAXBUF: 323 Dos::: 831 size: 964 Data::: 168049 _<..\universal.c: 403 &#8597;E_DEBUG_FLAGS: 319 _<HiRes.c: 381 BEGIN: 253 _<..\mro.c: 391 !: 517 IO::: 1101 &#9788;: 517 total_size: 988 &#8593;: 370 pp: 19157 _: 497 ActivePerl::: 137287 _<C:/Perl64/lib/constant.pm: 463 Exporter::: 120414 Internals::: 4478 STDIN: 253 Config::: 113324 warnings::: 93953 DB::: 1071 Time::: 47484 EPOC::: 833 _<.\win32.c: 393 &#9644;: 936 _<perllib.c: 393 2: 524 _<Cwd.c: 375 cmpthese: 37036 1: 533 &#8616;ARNING_BITS: 553 CORE::: 1086 _<Size.c: 379 Win32CORE::: 1174 attributes::: 1176 stdin: 301 ARGV: 499 INC: 5306 _<..\activeperl.c: 405 _<C:/Perl64/site/lib/auto/Devel/Size/Size.dll: 533 Scalar::: 2018 ENV: 10062 ?: 493 vars::: 15120 subs::: 5644 _<..\perlio.c: 397 _<Win32CORE.c: 397 XSLoader::: 27593 main::: 1006356 AutoLoader::: 40075 VMS::: 2363 Carp::: 44867 Win32::: 17184 PerlIO::: 3024 0: 541 : 850 _<..\xsutils.c: 399 @: 1074 Benchmark::: 200847 n: 282 STDOUT: 255 3: 504 ]: 393 _<C:/Perl64/lib/auto/Time/HiRes/HiRes.dll: 509 &#8616;: 499 MIME::: 1412 STDERR: 255 ActiveState::: 72715 _<dl_win32.c: 401 sleep: 998

    It won't always find the leak, but it can point you in the right direction.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thanks a lot! This looks like the right direction, but for some reason on my machine it immediately crashes:

      7 [main] perl 4156 _cygtls::handle_exceptions: Error while dumping sta +te (probably corrupted stack) Segmentation fault (core dumped)
      Maybe Devel::Size can't work on Cygwin? (Strange, but possible...)
        Maybe Devel::Size can't work on Cygwin? (Strange, but possible...)

        It only inspects Perl's internal structures, so should run anywhere Perl does, but since I don't use Cygwin, I neither know nor care.

        Why not run your script native for the purposes of solving your problem? Either on a native *nix platform if that's your target and your reason for using Cygwin (though developing under an emulator is a silly idea). Or native Win32 if that your target. (If so, why use the emulator?)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      ...And besides, doesn't this approach miss out on "my" variables? Most of the variables there are defined as "my".

      Correct me if i'm wrong.

        Yes, but if you combine it with Devel::DumpSizes it can help you track down problems.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: how much memory each Perl variable uses
by Anonymous Monk on Sep 08, 2009 at 20:01 UTC
    Ignoring the more useful question about debugging, might this be your root problem?
    memory consumption

      I didn't exactly understand which problem are you referring to, but the comments there give me a couple of ideas for slightly easier debugging, so thanks for that. For example, maybe i'll try to use less global variables.

      However, unlike the program described there, this is a static analyzer - it is not designed to run continually, but to process a dump and that's all. The trouble is that with MediaWiki dumps that have several tens of thousands of pages the program runs out of memory and never finishes processing the whole dump. (FWIW this machine has 2 GB of RAM.)

      So the problem is probably not the same and my question still stands - is there a way to inspect all the variables of a program and the modules that it uses and find out their memory consumption?