Parsing STDERR and STDOUT at the same time

qazwart has asked for the wisdom of the Perl Monks concerning the following question:

Right now, I have a Kornshell script to do this, but I believe it will be faster and more efficient if I rewrite it in Perl.

One of the lines in my shell script does this:

$ cvs rlog -r$OLD::$NEW -SN $module 2> errors.txt > output.txt
[download]

I parse "output.txt" to find all the release notes and files that have been changed. In fact, one of the reasons I want to redo this in Perl is that I believe I can do this part of the job more efficiently in Perl. No real problem here. Very basic programming stuff.

The problem is with the "errors.txt" file. I take this file, and do quite a bit of parsing:

First I grep out all the lines that say "warning: no revision `$OLD' in" in one file. I then grep out all the lines that say "warning: no revision `$NEW' in" another file. After that, I do a unified diff on the two outputs. Lines that start with a "-" are for files that have been added since release $OLD. Lines that start with a "+" are for files that have been deleted since release $OLD.

It takes quite a bit of processing. First, I have to capture STDERR in a file, grep it twice, diff it, parse the output and separate out the file name from the rest of the resulting lines.

It should be much, much easier in Perl. I could do everything in a single pass and avoid all temporary files.

However, that's the problem. I can do an "open" on the CVS command to capture STDOUT as if it was a simple text file. But, how in the world do I process the STDERR from that CVS command at the same time? I could (like I do in Kornshell) save the output in a file, and open that file later, but it strikes me there has to be a way I can operate on both STDOUT and STDERR at the same time.

Comment on Parsing STDERR and STDOUT at the same time Download Code

Replies are listed 'Best First'.
Re: Parsing STDERR and STDOUT at the same time by liverpole (Monsignor) on Feb 01, 2007 at 22:02 UTC
Hi qazwart, The following code is based on an example in the documentation from perlfaq8 (under the question "How can I capture STDERR from an external command?"): #!/usr/bin/perl -w # Strict use strict; use warnings; # Libraries use IPC::Open3; use Symbol qw(gensym); use IO::File; #################### ### Main program ### #################### # User-defined my $module = "mymodule"; my $OLD = "1.41"; my $NEW = "1.42"; my $cmd = "cvs rlog -r${OLD}::${NEW} -SN $module"; local CATCHOUT = IO::File->new_tmpfile; local CATCHERR = IO::File->new_tmpfile; my $pid = open3(gensym(), ">&CATCHOUT", ">&CATCHERR", $cmd); waitpid($pid, 0); seek(\CATCHOUT, 0, 0); seek(\CATCHERR, 0, 0); process_stdout(CATCHOUT); process_stderr(CATCHERR); sub process_stdout { my ($outfh) = @_; while (my $line = <$outfh>) { # Handle $line from STDOUT chomp $line; print "\e[102m$line\e[m\n"; # Change this } } sub process_stderr { my ($errfh) = @_; while (my $line = <$errfh>) { # Handle $line from STDERR chomp $line; print "\e[101m$line\e[m\n"; # Change this } } [download] I even tested it on a cvs module locally to make sure it handles both STDOUT and STDERR from the `cvs rlog` command correctly. At the moment, it's just displaying the lines it reads (in different colors; green for STDOUT, red for STDERR), but you can modify the 2 lines marked "Change this" above to suit your needs. s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/	[reply] [d/l] [select]
Re^2: Parsing STDERR and STDOUT at the same time by ikegami (Patriarch) on Feb 02, 2007 at 17:05 UTC
That will deadlock. If either the STDOUT or the STDERR pipe fills up, `waitpid` will never return. That's why `select` is required. (Or in this case, `can_read` since we don't feed any input to `cvs`.) #!/usr/bin/perl use strict; use warnings; use IO::Select qw( ); use IPC::Open3 qw( open3 ); use constant BLOCK_SIZE => 4096; sub process_stdout { my ($fh) = @_; local _; # Protect caller's $_ while (<$fh>) { chomp; print("[out:$_]\n"); } } sub process_stderr { my ($fh) = @_; local _; # Protect caller's $_ while (<$fh>) { chomp; print("[err:$_]\n"); } } { # User-defined my $module = "mymodule"; my $OLD = "1.41"; my $NEW = "1.42"; my $cmd = "cvs rlog -r${OLD}::${NEW} -SN $module"; my ($fh_cvs_in, $fh_cvs_out, $fh_cvs_err); my $pid = open3($fh_cvs_in, $fh_cvs_out, $fh_cvs_err, $cmd); my $r_sel = IO::Select->new($fh_cvs_out, $fh_cvs_err); my $cvs_out = ''; my $cvs_err = ''; while ($r_sel->handles()) { my @r = $r_sel->can_read(); foreach my $fh (@r) { if ($fh == $fh_cvs_out) { my $rv = sysread($fh, $buf, BLOCK_SIZE, length($buf)); if (not defined $rv) { die("Unable to communicate with CVS: $!\n"); } if (not $rv) { # End of file $r_sel->remove($fh_cvs_out); } } elsif ($fh == $fh_cvs_err) { my $rv = sysread($fh, $buf, BLOCK_SIZE, length($buf)); if (not defined $rv) { die("Unable to communicate with CVS: $!\n"); } if (not $rv) { # End of file $r_sel->remove($fh_cvs_err); } } } } waitpid($pid, 0); { # Requires Perl 5.8. # In Perl 5.6, use IO::Scalar # or work with $csv_out directly. use 5.008000; open(my $fh, '<', \$cvs_out); process_stdout($fh); } { # Requires Perl 5.8. # In Perl 5.6, use IO::Scalar # or work with $csv_err directly. use 5.008000; open(my $fh, '<', \$cvs_err); process_stderr($fh); } } [download] IPC::Run is designed to simplify this, but I have never used it. use strict; use warnings; use IPC::Run qw( run ) ; sub process_stdout { my ($fh) = @_; local _; # Protect caller's $_ while (<$fh>) { chomp; print("[out:$_]\n"); } } sub process_stderr { my ($fh) = @_; local _; # Protect caller's $_ while (<$fh>) { chomp; print("[err:$_]\n"); } } { # User-defined my $module = "mymodule"; my $OLD = "1.41"; my $NEW = "1.42"; my @cmd = ( 'cvs', 'rlog', "-r${OLD}::${NEW}" '-SN', $module ); my $cvs_in = ''; my $cvs_out = ''; my $cvs_err = ''; run \@cmd, \$cvs_in, \$cvs_out, \$cvs_err or die("Unable to launch CVS\n"); { # Requires Perl 5.8. # In Perl 5.6, use IO::Scalar # or work with $csv_out directly. use 5.008000; open(my $fh, '<', \$cvs_out); process_stdout($fh); } { # Requires Perl 5.8. # In Perl 5.6, use IO::Scalar # or work with $csv_err directly. use 5.008000; open(my $fh, '<', \$cvs_err); process_stderr($fh); } } [download] Neither snippet has been tested. Either or both snippets may need to trap SIGPIPE.	[reply] [d/l] [select]
Re^3: Parsing STDERR and STDOUT at the same time by liverpole (Monsignor) on Feb 02, 2007 at 19:18 UTC
I agree with ikegami that select is necessary. Neither of the snippets above will quite work as written, however. In the second one there's a missing comma in the `my @cmd = ( ... )` line. In the first one, `$buf` is never defined, and even if it were, you'd be overwriting it with the `sysread` call. (It's also not necessary to use the OFFSET argument to `sysread`, since you can't perform a seek on STDOUT or STDERR). You're also (in the first snippet) not getting anything written to `STDERR`, which puzzled me for a while, until I re-read the IPC::Open documentation more closely: `If ERRFH is false, or the same file descriptor as RDRFH, then STDO +UT and STDERR of the child are on the same filehandle.` [download] Here's a suggested rewrite, using a subroutine `run_command`. You give it a single argument which is the command to run, and it returns two list-references; the first is the lines of STDOUT, the second the lines of STDERR: #!/usr/bin/perl -w # Strict use strict; use warnings; # Libraries use IO::File; use IO::Select; use IPC::Open3 qw/ open3 /; # Constants use constant BLOCK_SIZE => 4096; # User-defined my $module = "server"; my $OLD = "1.41"; my $NEW = "1.42"; my $cmd = "cvs rlog -r${OLD}::${NEW} -SN $module"; my $outbuf = ""; my $errbuf = ""; #################### ### Main program ### #################### my ($pout, $perr) = run_command($cmd); # Verify data print "\n[STDOUT]\n"; map { print "\e[102m$_\e[m\n" } @$pout; print "\n[STDERR]\n"; map { print "\e[101m$_\e[m\n" } @$perr; # Now do whatever you want to with the data $pout and $perr ... ################### ### Subroutines ### ################### sub run_command { my ($cmd) = @_; my $errfh = IO::File::new(); my $pid = open3(my $infh, my $outfh, $errfh, $cmd); my $r_sel = IO::Select->new($outfh, $errfh); my $pfh = { $outfh => [ ], $errfh => [ ] }; while ($r_sel->handles()) { my @can_read = $r_sel->can_read(0); foreach my $fh (@can_read) { my $pbuf = $pfh->{$fh}; my $rv = sysread($fh, my $text, BLOCK_SIZE); (defined $rv) or die "Failed cmd '$cmd' ($!)\n"; $rv or $r_sel->remove($fh); push @$pbuf, $text; } } waitpid($pid, 0); my $pout = [ split(/\n/, join("", @{$pfh->{$outfh}})) ]; my $perr = [ split(/\n/, join("", @{$pfh->{$errfh}})) ]; return ($pout, $perr); } [download] s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/	[reply] [d/l] [select]
Re^4: Parsing STDERR and STDOUT at the same time by ikegami (Patriarch) on Feb 02, 2007 at 19:56 UTC
Re^2: Parsing STDERR and STDOUT at the same time by qazwart (Scribe) on Feb 02, 2007 at 15:50 UTC
Wow. Thanks. This is exactly what I'm looking for.	[reply]
Re: Parsing STDERR and STDOUT at the same time by Tanktalus (Canon) on Feb 01, 2007 at 21:36 UTC
If you want stdout and stderr in separate streams, you can do this with a nice mixture of fork, open, select (or IO::Select) and other goodies... or you can just use IPC::Open3. I've used IPC::Open3 for this type of thing before - I still used IO::Select to make sure I am reading from both handles without letting either buffer fill up.	[reply]
Re: Parsing STDERR and STDOUT at the same time by kyle (Abbot) on Feb 01, 2007 at 20:02 UTC
You can redirect `STDERR` to `STDOUT` using the shell. `my $cvs_cmd = "cvs rlog -r$OLD::$NEW -SN $module"; my $cvs_pid = open my $cvs_pipe, '-\|', "$cvs_cmd 2>&1" or die "Can't read pipe: $!";` [download] Then read from `$cvs_pipe` like any other file handle. The `STDERR` output will be mixed in with the `STDOUT` output. I hear IPC::Run is good for this kind of thing, but I've never used it.	[reply] [d/l] [select]
Re^2: Parsing STDERR and STDOUT at the same time by qazwart (Scribe) on Feb 02, 2007 at 15:49 UTC
That's true about blending <STDERR> and <STDOUT> in the CVS command itself, but I don't want to mix up STDERR and STDOUT. I process STDOUT for the names of the files and the changes. The flow is pretty straight forward on finding the comments (likes after /^date:/ begin a comment. Comments end with a /^(-\|=)+$/. If STDERR was included in the mix, it makes it much harder to separate out that information. I also depend upon the fact that when a file isn't tagged with either release, the two lines that report this information appear one after the other in <STDERR>. This makes it very easy to determine when a file has one tag, but not the other (which means it was either added or deleted from the current release). If <STDOUT> and <STDERR> were merged, I could no longer make that assumption.	[reply]