cyberconte has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to capture the output of an external program. for this, i've always used backticks...
`program args`
However i've run into the situation where the program i need to run for a particular script prints a tremendous amount of output. (I'm talking on the magnitude of hundreds of megabytes). I've never bothered myself with the details of backticking, however now its an issue. even in a
while(`program args`)
loop, perl aparently stores all of the data internally and waits for the program to finish before acutally starting the loop. This is unacceptable in this situation, as it will easily bring the system to its knees.

Instead, i used
open PROG "program args |" while (<PROG>) { ... } close PROG
to aleviate this strain, but is this really the best way of doing it? Has anyone else run into this problem in the past, and how was it handled?

Replies are listed 'Best First'.
Re: External program with large amounts of output
by Juerd (Abbot) on Apr 15, 2002 at 13:31 UTC
Re: External program with large amounts of output
by Fletch (Bishop) on Apr 15, 2002 at 13:53 UTC
    while(`program args`)

    FYI, this is just the same as doing:

    { my @a = `program args`; while( @a ) { ... } }

    Backticks / qx// always wait for the program spawned to return before returning.

Re: External program with large amounts of output
by strat (Canon) on Apr 15, 2002 at 14:47 UTC
    I prefer pipe-open to backticks because it is much easier to ask for errors, and it works like a filehandle...

    unless (open (CMDREAD, "$program $args |")) { die "ERROR in executing $program $args: $!\n"; } else { while (<CMDREAD>){ # do something with $_ } close (CMDREAD); } # else
    But if it really is a big ammount of data, I'd execute it and have it's output writing to a file and then read it from there with
    while (<FILE>)
    or the like...

    Best regards,
    perl -le "s==*F=e=>y~\*martinF~stronat~=>s~[^\w]~~g=>chop,print"

      It might also be reasonable to use the operating system to save the output to a file, and then load the file.

      % extprog >big.data % myperl big.data

      Among other benefits, this lets you replay the output at will, allowing you to tweak the script without too much effort. You could also munge the data file to extract the first n records, as an additonal way of testing.

      Finally, you can take all these data files you create in the process of testing and use them for regression testing: when you make a substantial change to your code, all those files will come in handy for verifying that you haven't borken anything.

      The code you present seems a little odd to me. I would sooner have written it as

      open(CMDREAD, "$program $args |") or die "ERROR in executing $program $args: $!\n"; while (<CMDREAD>){ # do something with $_ } close (CMDREAD);

      i.e., lose the else. Sort of flatter, I guess. That, and I only use unless as a statement modfier, not for flow control.

      $foo = $bar + 1 unless $moon->phase < 0.2;


      print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'
        Hello grinder,

        your way is much shorter and perhaps easier to read.

        The reasons why I wrapped it into an unless-else-block are the following:

        • If I post code for beginners or slightly enhanced perl-programmers, I try to post code that is as easy and flexible as possible; if they want to do something before die, the unless-construct is much easier to enhance (well, you could do e.g. open (CMDREAD, ...) or &DoSomething and die(...); or the like, but there you have to care about the returnvalue of &DoSomething and it becomes more complicated than necessary (seen so on www.perl.de) ).
        • The else-Block is not neccessary with die or return or the like. But if you replace the die with warn, all of a sudden "else" becomes very important. And some scripts are changed very often over the years (because requirements change or whatever), so I try to build full logical blocks whenever possible.
        • I hardly use die in my programs, because with the stuff I usually write it is often not ok to just die with a message; the script should try to solve the error by itself if possible or at least find some additional information about it or do some cleaning up, and so the "open or die" is idiomatic perl, but very often not usable to me. And so, in most of my posted code, die is just a placeholder for doing some better errorhandling.

        I'm very interested what other people think about this issue, so I'd be glad about lots of comments.

        Best regards,
        perl -le "s==*F=e=>y~\*martinF~stronat~=>s~[^\w]~~g=>chop,print"