NetWallah has asked for the wisdom of the Perl Monks concerning the following question:

GentleMonks -

I have a perl program (don't you hate it when people call them 'scripts') that processes thousands of xml files that have been individually tgz compressed.

I have been plagued by apparently random failures during "extract" in Archive::Extract. After hours debugging, I believe that the root cause is apparently inside Cwd::cwd, which is used by Archive::Extract. The cwd() function calls '/bin/pwd' via backticks - this works fine most of the time, but I finally caught a case where it returns undef.

At the time it returns undef, I examined $!, which contained - you guessed it - 'Cannot allocate memory'.

The underlying tgz/xml file DOES process successfully when passed through the program individually, or in a small group of files.

Please advise how to diagnose further, or other recommend steps. UPDATE : Here is the failure I get:

Key 'archive' (filename.gz) is of invalid type for 'Archive::Extract:: +new' provided by main::parse_XML at PerfDBparse.pl line 447 at /usr/share/perl/5.10/Params/Check.pm line 345 Params::Check::check('HASH(0x2a3e660)', 'HASH(0x2a3e5a0)') cal +led at /usr/local/share/perl/5.10.0/Archive/Extract.pm line 227 Archive::Extract::new('Archive::Extract', 'archive', 'FileName +.gz') called at PerfDBparse.pl line 447 main::parse_XML('filename', 'allXML', 288600) called at PerfDB +parse.pl line 131 ...(Truncated, and file names changed to protect the innocent). +..
OS: Linux ManchineName 2.6.28-19-server #64-Ubuntu SMP Wed Aug 18 22:43:50 UTC 2010 x86_64 GNU/Linux

Memory info at time of failure:

dmin [ ~ ]$ free -m total used free shared buffers cac +hed Mem: 1987 1873 114 0 125 +275 -/+ buffers/cache: 1471 515 Swap: 475 33 442
This is the first time I'm dealing with an apparent memory leak. Any known issues ? BTW, I use XML::Bare for XML parsing.

     Syntactic sugar causes cancer of the semicolon.        --Alan Perlis

Replies are listed 'Best First'.
Re: backticks and 'Cannot allocate memory'
by JavaFan (Canon) on Sep 23, 2010 at 22:12 UTC
    At the time it returns undef, I examined $!
    You can only trust $! to contain something sensible immediately after a failed system call. (Which is something else than a call to system). Since cwd() calls pwd inside backticks, there's no system call, let alone a failed one (unless the fork() would fail, but then, time has passed). You may have more success in checking $?.
      Here is what I get in $?:
      DB<5> x $?,$! 0 '-1' 1 'Cannot allocate memory' DB<6> x qx|pwd| empty array DB<7> x $?, $! 0 '-1' 1 'Cannot allocate memory'
      Memory utilization (ps -auxf):
      admin 5033 6.3 61.4 1301804 1251316 pts/1 S+ 16:56 2:17 | \_ perl -d Program-name

           Syntactic sugar causes cancer of the semicolon.        --Alan Perlis

        $! being 1 is probably just a default value. I get $! to be 1 as well, but it means something differently:
        DB<1> x $?, $! + 0 0 1 'Illegal seek'
        However, my pwd does succeed. It still leaves $! to be 1, but it's string value changes:
        DB<2> x qx|pwd| + 0 '/tmp ' DB<3> x $?, $! + 0 0 1 ''
        It's curious to see your pwd fails. Is that from a debugging session where you aren't doing anything else? What happens if you use the fully qualified name to pwd? Anyway, the external pwd failing isn't something that happens on the perl level.
Re: backticks and 'Cannot allocate memory'
by salva (Canon) on Sep 24, 2010 at 09:16 UTC
    It's funny because one of my coworkers had exactly the same problem last week.

    It fails because under the hood, the backtick operator calls fork and the OS following a conservative approach, aborts the operation because it doesn't have enough memory to hold two process images that big.

    The simplest workaround is to allow the kernel to do memory overcommit.

    Or alternatively, you could increase the swap space on the machine.

    Update:

    For perl porters reading this: wouldn't it be possible to use vfork() where supported by the OS to implement qx, system or open(... | ...), etc.?

    At some point in the past any decent OS got COW making vfork mostly useless, but now, machines without or with very tiny swap spaces are becoming common making vfork pertinent again!

      Stevens and Rago write in "Advanced Programming in the Unix Environment":
      The vfork function originated with 2.9BSD. Some consider the function a blemish, but all the platforms covered in this book support it. In fact, the BSD developers removed it from the 4.4BSD release, but all the open source BSD distributions that derive from 4.4BSD add support for it back into their own releases. The vfork function is marked as an obsolete interface in Version 3 of the Single UNIX Specification.
      Thank you very much! This solved a mystery that had us all stumped for a week.
Re: backticks and 'Cannot allocate memory'
by NetWallah (Canon) on Sep 25, 2010 at 00:54 UTC
    UPDATE/Progress report:

    I messed with the code, to avoid using Archive::Extract, using the 2-arg pipe "open" to directly run "gunzip" and get the content.

    This gave me the same error - after about 30 files, the open/fork failed with "cannot allocate memory".

    Then I googled perl,fork,cannot allocate memory
    and there was a recommendation to adjust the swap size.

    I added a 1GB swap space (original was 0.5 G), and things are running much better !!!.

    Need to wait for the program to complete, but it looks like it is running for much longer.

    FYI - the machine (actually a VM) has 1GB memory.

    I will post another update if it fails - otherwise - Happy weekend !!!

    Update2:I'm happy to be able to confirm that the program is completing.

         Syntactic sugar causes cancer of the semicolon.        --Alan Perlis