qazwart has asked for the wisdom of the Perl Monks concerning the following question:

Right now, I have a Kornshell script to do this, but I believe it will be faster and more efficient if I rewrite it in Perl.

One of the lines in my shell script does this:

$ cvs rlog -r$OLD::$NEW -SN $module 2> errors.txt > output.txt
I parse "output.txt" to find all the release notes and files that have been changed. In fact, one of the reasons I want to redo this in Perl is that I believe I can do this part of the job more efficiently in Perl. No real problem here. Very basic programming stuff.

The problem is with the "errors.txt" file. I take this file, and do quite a bit of parsing:

First I grep out all the lines that say "warning: no revision `$OLD' in" in one file. I then grep out all the lines that say "warning: no revision `$NEW' in" another file. After that, I do a unified diff on the two outputs. Lines that start with a "-" are for files that have been added since release $OLD. Lines that start with a "+" are for files that have been deleted since release $OLD.

It takes quite a bit of processing. First, I have to capture STDERR in a file, grep it twice, diff it, parse the output and separate out the file name from the rest of the resulting lines.

It should be much, much easier in Perl. I could do everything in a single pass and avoid all temporary files.

However, that's the problem. I can do an "open" on the CVS command to capture STDOUT as if it was a simple text file. But, how in the world do I process the STDERR from that CVS command at the same time? I could (like I do in Kornshell) save the output in a file, and open that file later, but it strikes me there has to be a way I can operate on both STDOUT and STDERR at the same time.

Replies are listed 'Best First'.
Re: Parsing STDERR and STDOUT at the same time
by liverpole (Monsignor) on Feb 01, 2007 at 22:02 UTC
    Hi qazwart,

    The following code is based on an example in the documentation from perlfaq8 (under the question "How can I capture STDERR from an external command?"):

    #!/usr/bin/perl -w # Strict use strict; use warnings; # Libraries use IPC::Open3; use Symbol qw(gensym); use IO::File; #################### ### Main program ### #################### # User-defined my $module = "mymodule"; my $OLD = "1.41"; my $NEW = "1.42"; my $cmd = "cvs rlog -r${OLD}::${NEW} -SN $module"; local *CATCHOUT = IO::File->new_tmpfile; local *CATCHERR = IO::File->new_tmpfile; my $pid = open3(gensym(), ">&CATCHOUT", ">&CATCHERR", $cmd); waitpid($pid, 0); seek(\*CATCHOUT, 0, 0); seek(\*CATCHERR, 0, 0); process_stdout(*CATCHOUT); process_stderr(*CATCHERR); sub process_stdout { my ($outfh) = @_; while (my $line = <$outfh>) { # Handle $line from STDOUT chomp $line; print "\e[102m$line\e[m\n"; # Change this } } sub process_stderr { my ($errfh) = @_; while (my $line = <$errfh>) { # Handle $line from STDERR chomp $line; print "\e[101m$line\e[m\n"; # Change this } }

    I even tested it on a cvs module locally to make sure it handles both STDOUT and STDERR from the cvs rlog command correctly.

    At the moment, it's just displaying the lines it reads (in different colors; green for STDOUT, red for STDERR), but you can modify the 2 lines marked "Change this" above to suit your needs.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      That will deadlock. If either the STDOUT or the STDERR pipe fills up, waitpid will never return. That's why select is required. (Or in this case, can_read since we don't feed any input to cvs.)
      #!/usr/bin/perl use strict; use warnings; use IO::Select qw( ); use IPC::Open3 qw( open3 ); use constant BLOCK_SIZE => 4096; sub process_stdout { my ($fh) = @_; local *_; # Protect caller's $_ while (<$fh>) { chomp; print("[out:$_]\n"); } } sub process_stderr { my ($fh) = @_; local *_; # Protect caller's $_ while (<$fh>) { chomp; print("[err:$_]\n"); } } { # User-defined my $module = "mymodule"; my $OLD = "1.41"; my $NEW = "1.42"; my $cmd = "cvs rlog -r${OLD}::${NEW} -SN $module"; my ($fh_cvs_in, $fh_cvs_out, $fh_cvs_err); my $pid = open3($fh_cvs_in, $fh_cvs_out, $fh_cvs_err, $cmd); my $r_sel = IO::Select->new($fh_cvs_out, $fh_cvs_err); my $cvs_out = ''; my $cvs_err = ''; while ($r_sel->handles()) { my @r = $r_sel->can_read(); foreach my $fh (@r) { if ($fh == $fh_cvs_out) { my $rv = sysread($fh, $buf, BLOCK_SIZE, length($buf)); if (not defined $rv) { die("Unable to communicate with CVS: $!\n"); } if (not $rv) { # End of file $r_sel->remove($fh_cvs_out); } } elsif ($fh == $fh_cvs_err) { my $rv = sysread($fh, $buf, BLOCK_SIZE, length($buf)); if (not defined $rv) { die("Unable to communicate with CVS: $!\n"); } if (not $rv) { # End of file $r_sel->remove($fh_cvs_err); } } } } waitpid($pid, 0); { # Requires Perl 5.8. # In Perl 5.6, use IO::Scalar # or work with $csv_out directly. use 5.008000; open(my $fh, '<', \$cvs_out); process_stdout($fh); } { # Requires Perl 5.8. # In Perl 5.6, use IO::Scalar # or work with $csv_err directly. use 5.008000; open(my $fh, '<', \$cvs_err); process_stderr($fh); } }

      IPC::Run is designed to simplify this, but I have never used it.

      use strict; use warnings; use IPC::Run qw( run ) ; sub process_stdout { my ($fh) = @_; local *_; # Protect caller's $_ while (<$fh>) { chomp; print("[out:$_]\n"); } } sub process_stderr { my ($fh) = @_; local *_; # Protect caller's $_ while (<$fh>) { chomp; print("[err:$_]\n"); } } { # User-defined my $module = "mymodule"; my $OLD = "1.41"; my $NEW = "1.42"; my @cmd = ( 'cvs', 'rlog', "-r${OLD}::${NEW}" '-SN', $module ); my $cvs_in = ''; my $cvs_out = ''; my $cvs_err = ''; run \@cmd, \$cvs_in, \$cvs_out, \$cvs_err or die("Unable to launch CVS\n"); { # Requires Perl 5.8. # In Perl 5.6, use IO::Scalar # or work with $csv_out directly. use 5.008000; open(my $fh, '<', \$cvs_out); process_stdout($fh); } { # Requires Perl 5.8. # In Perl 5.6, use IO::Scalar # or work with $csv_err directly. use 5.008000; open(my $fh, '<', \$cvs_err); process_stderr($fh); } }

      Neither snippet has been tested.
      Either or both snippets may need to trap SIGPIPE.

        I agree with ikegami that select is necessary.  Neither of the snippets above will quite work as written, however.

        In the second one there's a missing comma in the my @cmd = ( ... ) line.

        In the first one, $buf is never defined, and even if it were, you'd be overwriting it with the sysread call.    (It's also not necessary to use the OFFSET argument to sysread, since you can't perform a seek on STDOUT or STDERR).

        You're also (in the first snippet) not getting anything written to STDERR, which puzzled me for a while, until I re-read the IPC::Open documentation more closely:

        If ERRFH is false, or the same file descriptor as RDRFH, then STDO +UT and STDERR of the child are on the same filehandle.

        Here's a suggested rewrite, using a subroutine run_command.  You give it a single argument which is the command to run, and it returns two list-references; the first is the lines of STDOUT, the second the lines of STDERR:

        #!/usr/bin/perl -w # Strict use strict; use warnings; # Libraries use IO::File; use IO::Select; use IPC::Open3 qw/ open3 /; # Constants use constant BLOCK_SIZE => 4096; # User-defined my $module = "server"; my $OLD = "1.41"; my $NEW = "1.42"; my $cmd = "cvs rlog -r${OLD}::${NEW} -SN $module"; my $outbuf = ""; my $errbuf = ""; #################### ### Main program ### #################### my ($pout, $perr) = run_command($cmd); # Verify data print "\n[STDOUT]\n"; map { print "\e[102m$_\e[m\n" } @$pout; print "\n[STDERR]\n"; map { print "\e[101m$_\e[m\n" } @$perr; # Now do whatever you want to with the data $pout and $perr ... ################### ### Subroutines ### ################### sub run_command { my ($cmd) = @_; my $errfh = IO::File::new(); my $pid = open3(my $infh, my $outfh, $errfh, $cmd); my $r_sel = IO::Select->new($outfh, $errfh); my $pfh = { $outfh => [ ], $errfh => [ ] }; while ($r_sel->handles()) { my @can_read = $r_sel->can_read(0); foreach my $fh (@can_read) { my $pbuf = $pfh->{$fh}; my $rv = sysread($fh, my $text, BLOCK_SIZE); (defined $rv) or die "Failed cmd '$cmd' ($!)\n"; $rv or $r_sel->remove($fh); push @$pbuf, $text; } } waitpid($pid, 0); my $pout = [ split(/\n/, join("", @{$pfh->{$outfh}})) ]; my $perr = [ split(/\n/, join("", @{$pfh->{$errfh}})) ]; return ($pout, $perr); }

        s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      Wow. Thanks. This is exactly what I'm looking for.
Re: Parsing STDERR and STDOUT at the same time
by Tanktalus (Canon) on Feb 01, 2007 at 21:36 UTC

    If you want stdout and stderr in separate streams, you can do this with a nice mixture of fork, open, select (or IO::Select) and other goodies... or you can just use IPC::Open3. I've used IPC::Open3 for this type of thing before - I still used IO::Select to make sure I am reading from both handles without letting either buffer fill up.

Re: Parsing STDERR and STDOUT at the same time
by kyle (Abbot) on Feb 01, 2007 at 20:02 UTC

    You can redirect STDERR to STDOUT using the shell.

    my $cvs_cmd = "cvs rlog -r$OLD::$NEW -SN $module"; my $cvs_pid = open my $cvs_pipe, '-|', "$cvs_cmd 2>&1" or die "Can't read pipe: $!";

    Then read from $cvs_pipe like any other file handle. The STDERR output will be mixed in with the STDOUT output.

    I hear IPC::Run is good for this kind of thing, but I've never used it.

      That's true about blending <STDERR> and <STDOUT> in the CVS command itself, but I don't want to mix up STDERR and STDOUT.

      I process STDOUT for the names of the files and the changes. The flow is pretty straight forward on finding the comments (likes after /^date:/ begin a comment. Comments end with a /^(-|=)+$/. If STDERR was included in the mix, it makes it much harder to separate out that information.

      I also depend upon the fact that when a file isn't tagged with either release, the two lines that report this information appear one after the other in <STDERR>. This makes it very easy to determine when a file has one tag, but not the other (which means it was either added or deleted from the current release). If <STDOUT> and <STDERR> were merged, I could no longer make that assumption.