italdesign has asked for the wisdom of the Perl Monks concerning the following question:

Update (as posted in a reply)

I also suspect there is an easier way to accomplish my goal. Well, there's no better place to ask!

Background: A colleague with a large code base reached out to me for help. There are places in his code where he executes a series of piped Unix cmds, like "cmd1|cmd2|cmd3". If any cmd in the series fails, he wants to identify it correctly. This can't be done with a regular system call because it only gives you the status of the last cmd in the series. Enter IPC::Run. It worked like a charm until we started having "pipe out" cmds (not sure what the proper term is), like "|cmd1|cmd2|cmd3", where he is opening the FH for the cmd first and passing data to it later. I have to redesign the module that runs IPC::Run, and that's where I'm stuck. BTW, since the task is to implement the feature in his existing (and complicated) code, I want to wrap all the logic into my modules to keep his code change to a minimal

I've narrowed down the problem to this: after IPC::Run accepts the cmds, it runs each as a child process. The finish() method of the harness object (i.e. $ipc_run_h->finish()) calls waitpid() on each child cmd. If it gets -1, then it sets $? to "unknown result, unknown PID". This is what I'm getting, and I figured out after a while it's because I'm spawning the child cmds in the parent, but waitpid'ing in a child process of my own.

Why do that, you ask? First, if IPC::Run::start() and $ipc_run_h->finish() run in the same session, then all is well. This is what I did for regular cmds. With "pipe out" cmds like "|cmd1|cmd2|cmd3", I must pump data to the FH between calling start() and finish(). I also need to be able to process the output as it become available (rather than wait for the entire series to finish), since some cmds take a long time to finish. This is why I fork and have the child run $ipc_run_h->finish(), which leaves it in a different session.

Now that you know about the problem and goal, here's the complete code for illustrative purpose.

First, the main program, runPipe_PerlMonks.pl

# Uses RunPipe_PerlMonks module to run and trap failures in piped comm +ands. use strict; use warnings; use lib "/PATH_TO_RunPipe_PerlMonks"; use RunPipe_PerlMonks; # Here we pass a series of piped cmds to be run by IPC::Run. # The 3rd cmd, "grep zzzzzzzz", will fail. Goal is to identify it as t +he cmd that made the series fail (a normal system call only gets the +status of the last cmd, which in this case is a (false) success). my @cmds = ( ['sort'], ['uniq', '-c'], [qw(grep zzzzzzzz)], ['sort', ' +-rnk1'] ); my $rp = new RunPipe_PerlMonks('cmds' => \@cmds); # This will open the FH for the cmd, like "open FH_GLOB, '|-', 'sort | + uniq -c ...'", but it must be done by IPC::Run::start() since we're +using it to run the cmds. Must be done before passing data to the FH. $rp->start(\*FH_GLOB); ####### Now pump some data to the FH ###### my $max = 5; for (my $i = 0; $i < $max; $i++) { print FH_GLOB int(rand($max)) . "\n"; } close FH_GLOB; ############################################# # Get the FH that will contain the output so we can process it in real + time (rather than waiting for everything to finish first). my $fh = $rp->fh; # or die "Error: \$rp->fh not defined.\n"; # This forks. The parent returns to the main program (i.e. here) imme +diately so we can process the results while waiting for them to finis +h. The child calls $ipc_run_h->finish(), which waits for all cmds (i. +e. children) to finish. It doesn't work because it's a child waiting +for another child. Is there a better way to do this? $rp->run(); while (my $line = <$fh>) { # do something with each output line. print $line; } close $fh; my $failed_cmd = RunPipe_PerlMonks::failed_cmd; # Important: we want to identify the exact cmd that failed in a series + of piped cmds. A normal system call only returns the status of the l +ast cmd. This is why we're using IPC::Run. if (defined $failed_cmd) { print "Failed! cmd = '$failed_cmd'\n"; }
Here's RunPipe_PerlMonks.pm
package RunPipe_PerlMonks; use strict; use warnings; use IPC::Shareable; use IPC::Run; use FileHandle; my $failed_cmd; my $glue = 'h_09'; my %options = ( create => 'yes', exclusive => 1, mode => 0644, destroy => 1, ); # STATIC method. sub failed_cmd { IPC::Shareable->clean_up_all; return $failed_cmd; } sub new { shift; my $self = { 'cmds' => undef, @_ }; defined $self->{'cmds'} or die "ERROR: didn't get CMD in construct +or."; IPC::Shareable->clean_up_all; bless $self; return $self; } sub fh { my $self = shift; return $self->{'readfh'}; } # Input is an array of array references of commands. For example: ( [ +"ls", "-al"], ["grep", "-v", "something"] ); sub start { my $self = shift; my $fh_glob_ref = shift; my @cmds = @{$self->{'cmds'}} or die "Error: can't start() when CM +DS not defined!\n"; my ($readfh, $writefh) = FileHandle::pipe; $self->{'readfh'} = $readfh; $self->{'writefh'} = $writefh; my @ipcArray = (); my $ipc_run_h; #pipe each command to the next foreach my $cmd (@cmds) { push @ipcArray, $cmd; if (defined $fh_glob_ref && @ipcArray == 1) # ('<pipe', $fh) o +nly needs to be added to the first cmd. DO NOT added it to later cmd +s (doing so causes only the last cmd to be run). { push @ipcArray, '<pipe', $fh_glob_ref; } push @ipcArray, "|"; } pop @ipcArray; # get rid of the last "|" # Capture both stdout and stderr in $writefh. $ipc_run_h = IPC::Run::start(@ipcArray, '>', $writefh, "2>", $writ +efh); $self->{'h'} = $ipc_run_h; } sub run { my $self = shift; my $ipc_run_h = $self->{'h'}; my $readfh = $self->{'readfh'}; my $writefh = $self->{'writefh'}; defined $readfh or die "Error: READFH not defined\n"; defined $writefh or die "Error: WRITEFH not defined\n"; defined $ipc_run_h or die "Error: ipc_run_h not defined\n"; eval { tie($failed_cmd, 'IPC::Shareable', $glue, { %options }) }; $@ and die "ERROR: GLUE already bound.\n"; # Run cmd as child process, so we can return FH immediately to the + running program and process the output in real time. my $pid = fork(); if ($pid) # parent { my $child_pid = waitpid($pid, 0); if ($child_pid == -1) { die "Child stopped with an error! ($!)\n"; } elsif ($child_pid == 0) # child still running { die "ERROR: Child still running, but it should not have re +turned until done...\n"; } # if it gets here, child has finished. Close parent's copy of +writeFH. # note: DONT close the ReadFH. It needs to remain open for th +e driver program; it will trigger EOF when writeFH is closed. close $writefh; return; } ####### From here onward, it's the child running. #******* Now call $ipc_run_h->finish(), which waitspid() each chil +d. #******* THIS is the failure point in the code because now we're r +unning as the child, and we need to waitpid() for other children (eac +h cmd in the series). #******* Since a child can't waitpid another child, waitpid inside + $ipc_run_h->finish() always returns -1, the correct exit code is nev +er returned, and we can't identify the exact cmd that failed. if (!$ipc_run_h->finish()) #failure at some point { my $ctr = 0; foreach my $cmd (@{$self->{'cmds'}}) { #find the point where the failure occured, get the return +value at that point if($ipc_run_h->result($ctr) != 0){ my $returnval = $ipc_run_h->result($ctr); # Set the failed command. $failed_cmd = join(" ",@{$cmd}); last; } $ctr++; } } close $readfh; close $writefh; exit; } 1;

Running the above gives no output. Empty STDOUT is expected (since grep zzzzzz will have found nothing). But if working correctly, it would have printed out "grep zzzzzz" the failed cmd (I define fail as anything with a non-zero $?).

Suggestions?

One alternative I can think of is to dump the data into a temp file rather than passing it thru FH_GLOB. This would eradicate the need for "pipe out" cmds, before which all was well. But I suspect there may be better ways still.

===============================

Original Post

Here's the complete code illustrating what I'm trying to do. I can get into more details if needed, but basically, I have to waitpid($pid1) inside child #2, and it's not working - child #2 doesn't seem to know about $pid1 (parent #2 does, but that's not what I need).
# This version uses subs instead of classes. use strict; use warnings; my $pid1 = fork(); if ($pid1) # parent #1 { my $pid2 = fork(); if ($pid2) # parent #2 { # This is just to illustrate that waitpid($pid1) works in pare +nt #2 # Wait for child #2 to finish so we're not interfering with it +s waitpid() call. waitpid($pid2, 0); my $waitpid1 = waitpid($pid1, 0); print "waitpid1 (in parent #2) = '$waitpid1'\n"; } elsif ($pid2 == 0) # child #2 { print "I'm child #2\n"; #******* waitpid($pid1) inside child #2 will return -1. #******* It seems to not know about $pid1. How can I make it +work? my $waitpid1 = waitpid($pid1, 0); print "waitpid1 (in child #2) = '$waitpid1'\n"; } } elsif ($pid1 == 0) # child #1 { print "I'm child #1\n"; }
The output:
I'm child #1 I'm child #2 waitpid1 (in child #2) = '-1' waitpid1 (in parent #2) = '25341'
Question: Is there a way to make child #2 know about $pid1 such that waitpid($pid1) inside child #2 works?

Replies are listed 'Best First'.
Re: waitpid for child #1 inside child #2: any way to make it work?
by shmem (Chancellor) on May 07, 2014 at 02:07 UTC
    Question: Is there a way to make child #2 know about $pid1 such that waitpid($pid1) inside child #2 works?

    No. waitpid($pid1) inside child #2 will never work.The parent waits for the child, not the other way round. Grandfather doesn't wait for grandsons, and the reverse holds, too.

    For the first child (which you name parent2), $pid1 is the parent pid. Inside that child you can get at the $pid1 via getppid. If you want to get the pid from the grandfather in the grandson, the grandson has to ask its father via some other IPC mechanism. Then the grandson may kill 0, $grandfatherpid to see if the grandfather process is still alive. See kill.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: waitpid for child #1 inside child #2: any way to make it work?
by RMGir (Prior) on May 06, 2014 at 21:55 UTC
    Child1 can wait on child2 because Child1 is Child2's parent.

    According to waitpid, you can only wait for the results of your own child processes. Which makes sense when you think about it - waitpid is how parent processes can collect child process status, it makes no sense outside that relationship.

    You'll have to pick another Interprocess Communication mechanism :)


    Mike
Re: waitpid for child #1 inside child #2: any way to make it work?
by Anonymous Monk on May 06, 2014 at 22:44 UTC

    From this code and from what appears to be the original thread I can't tell why you need two fork()s like this - I smell an XY Problem. Could you describe what problem you're trying to solve with this code in the first place? (Or is this just an exercise in forking?) I strongly suspect there is an easier way to solve your original problem.

      I also suspect there is an easier way to accomplish my goal. Well, there's no better place to ask!

      Background: A colleague with a large code base reached out to me for help. There are places in his code where he executes a series of piped Unix cmds, like "cmd1|cmd2|cmd3". If any cmd in the series fails, he wants to identify it correctly. This can't be done with a regular system call because it only gives you the status of the last cmd in the series. Enter IPC::Run. It worked like a charm until we started having "pipe out" cmds (not sure what the proper term is), like "|cmd1|cmd2|cmd3", where he is opening the FH for the cmd first and passing data to it later. I have to redesign the module that runs IPC::Run, and that's where I'm stuck. BTW, since the task is to implement the feature in his existing (and complicated) code, I want to wrap all the logic into my modules to keep his code change to a minimal

      I've narrowed down the problem to this: after IPC::Run accepts the cmds, it runs each as a child process. The finish() method of the harness object (i.e. $ipc_run_h->finish()) calls waitpid() on each child cmd. If it gets -1, then it sets $? to "unknown result, unknown PID". This is what I'm getting, and I figured out after a while it's because I'm spawning the child cmds in the parent, but waitpid'ing in a child process of my own.

      Why do that, you ask? First, if IPC::Run::start() and $ipc_run_h->finish() run in the same session, then all is well. This is what I did for regular cmds. With "pipe out" cmds like "|cmd1|cmd2|cmd3", I must pump data to the FH between calling start() and finish(). I also need to be able to process the output as it become available (rather than wait for the entire series to finish), since some cmds take a long time to finish. This is why I fork and have the child run $ipc_run_h->finish(), which leaves it in a different session.

      Now that you know about the problem and goal, here's the complete code for illustrative purpose.

      First, the main program, runPipe_PerlMonks.pl

      # Uses RunPipe_PerlMonks module to run and trap failures in piped comm +ands. use strict; use warnings; use lib "/PATH_TO_RunPipe_PerlMonks"; use RunPipe_PerlMonks; # Here we pass a series of piped cmds to be run by IPC::Run. # The 3rd cmd, "grep zzzzzzzz", will fail. Goal is to identify it as t +he cmd that made the series fail (a normal system call only gets the +status of the last cmd, which in this case is a (false) success). my @cmds = ( ['sort'], ['uniq', '-c'], [qw(grep zzzzzzzz)], ['sort', ' +-rnk1'] ); my $rp = new RunPipe_PerlMonks('cmds' => \@cmds); # This will open the FH for the cmd, like "open FH_GLOB, '|-', 'sort | + uniq -c ...'", but it must be done by IPC::Run::start() since we're +using it to run the cmds. Must be done before passing data to the FH. $rp->start(\*FH_GLOB); ####### Now pump some data to the FH ###### my $max = 5; for (my $i = 0; $i < $max; $i++) { print FH_GLOB int(rand($max)) . "\n"; } close FH_GLOB; ############################################# # Get the FH that will contain the output so we can process it in real + time (rather than waiting for everything to finish first). my $fh = $rp->fh; # or die "Error: \$rp->fh not defined.\n"; # This forks. The parent returns to the main program (i.e. here) imme +diately so we can process the results while waiting for them to finis +h. The child calls $ipc_run_h->finish(), which waits for all cmds (i. +e. children) to finish. It doesn't work because it's a child waiting +for another child. Is there a better way to do this? $rp->run(); while (my $line = <$fh>) { # do something with each output line. print $line; } close $fh; my $failed_cmd = RunPipe_PerlMonks::failed_cmd; # Important: we want to identify the exact cmd that failed in a series + of piped cmds. A normal system call only returns the status of the l +ast cmd. This is why we're using IPC::Run. if (defined $failed_cmd) { print "Failed! cmd = '$failed_cmd'\n"; }
      Here's RunPipe_PerlMonks.pm
      package RunPipe_PerlMonks; use strict; use warnings; use IPC::Shareable; use IPC::Run; use FileHandle; my $failed_cmd; my $glue = 'h_09'; my %options = ( create => 'yes', exclusive => 1, mode => 0644, destroy => 1, ); # STATIC method. sub failed_cmd { IPC::Shareable->clean_up_all; return $failed_cmd; } sub new { shift; my $self = { 'cmds' => undef, @_ }; defined $self->{'cmds'} or die "ERROR: didn't get CMD in construct +or."; IPC::Shareable->clean_up_all; bless $self; return $self; } sub fh { my $self = shift; return $self->{'readfh'}; } # Input is an array of array references of commands. For example: ( [ +"ls", "-al"], ["grep", "-v", "something"] ); sub start { my $self = shift; my $fh_glob_ref = shift; my @cmds = @{$self->{'cmds'}} or die "Error: can't start() when CM +DS not defined!\n"; my ($readfh, $writefh) = FileHandle::pipe; $self->{'readfh'} = $readfh; $self->{'writefh'} = $writefh; my @ipcArray = (); my $ipc_run_h; #pipe each command to the next foreach my $cmd (@cmds) { push @ipcArray, $cmd; if (defined $fh_glob_ref && @ipcArray == 1) # ('<pipe', $fh) o +nly needs to be added to the first cmd. DO NOT added it to later cmd +s (doing so causes only the last cmd to be run). { push @ipcArray, '<pipe', $fh_glob_ref; } push @ipcArray, "|"; } pop @ipcArray; # get rid of the last "|" # Capture both stdout and stderr in $writefh. $ipc_run_h = IPC::Run::start(@ipcArray, '>', $writefh, "2>", $writ +efh); $self->{'h'} = $ipc_run_h; } sub run { my $self = shift; my $ipc_run_h = $self->{'h'}; my $readfh = $self->{'readfh'}; my $writefh = $self->{'writefh'}; defined $readfh or die "Error: READFH not defined\n"; defined $writefh or die "Error: WRITEFH not defined\n"; defined $ipc_run_h or die "Error: ipc_run_h not defined\n"; eval { tie($failed_cmd, 'IPC::Shareable', $glue, { %options }) }; $@ and die "ERROR: GLUE already bound.\n"; # Run cmd as child process, so we can return FH immediately to the + running program and process the output in real time. my $pid = fork(); if ($pid) # parent { my $child_pid = waitpid($pid, 0); if ($child_pid == -1) { die "Child stopped with an error! ($!)\n"; } elsif ($child_pid == 0) # child still running { die "ERROR: Child still running, but it should not have re +turned until done...\n"; } # if it gets here, child has finished. Close parent's copy of +writeFH. # note: DONT close the ReadFH. It needs to remain open for th +e driver program; it will trigger EOF when writeFH is closed. close $writefh; return; } ####### From here onward, it's the child running. #******* Now call $ipc_run_h->finish(), which waitspid() each chil +d. #******* THIS is the failure point in the code because now we're r +unning as the child, and we need to waitpid() for other children (eac +h cmd in the series). #******* Since a child can't waitpid another child, waitpid inside + $ipc_run_h->finish() always returns -1, the correct exit code is nev +er returned, and we can't identify the exact cmd that failed. if (!$ipc_run_h->finish()) #failure at some point { my $ctr = 0; foreach my $cmd (@{$self->{'cmds'}}) { #find the point where the failure occured, get the return +value at that point if($ipc_run_h->result($ctr) != 0){ my $returnval = $ipc_run_h->result($ctr); # Set the failed command. $failed_cmd = join(" ",@{$cmd}); last; } $ctr++; } } close $readfh; close $writefh; exit; } 1;

      Running the above gives no output. Empty STDOUT is expected (since grep zzzzzz will have found nothing). But if working correctly, it would have printed out "grep zzzzzz" the failed cmd (I define fail as anything with a non-zero $?).

      Suggestions?

      One alternative I can think of is to dump the data into a temp file rather than passing it thru FH_GLOB. This would eradicate the need for "pipe out" cmds, before which all was well. But I suspect there may be better ways still.

Re: waitpid for child #1 inside child #2: any way to make it work?
by Anonymous Monk on May 06, 2014 at 22:47 UTC

    Some very awkward logic. Understand that there's only one parent here. Not parent #1, parent #2, just the one parent creating two child processes.

Re: waitpid for child #1 inside child #2: any way to make it work?
by locked_user sundialsvc4 (Abbot) on May 07, 2014 at 16:12 UTC

    From Un*x man wait4:

    The wait4() and waitpid() calls will fail and return immediately if:
    [ECHILD]:  The process specified by pid does not exist or is not a child of the calling process ...

    See also perldoc perlport, which is a topic that’s all about operating-system dependent differences in behavior.

    Waiting for a process-id is specifically a mechanism designed to “reap” the final status from terminated (zombie) processes.   And, the entire notion of “zombies” is specifically engineered to facilitate that rendezvous without introducing race-conditions.   If you truly need to wait for an unrelated process to finish, one way to do that sort of thing is with a semaphore ... but the timing is very tricky.   Even if you launch two processes in quick succession, you cannot guarantee that the second will not begin executing first, and so on.

    You have a design problem here.   You are going to have to re-think portions of that design in order to obtain a program that not only “works at all,” but works reliably in all cases.

Re: waitpid for child #1 inside child #2: any way to make it work?
by italdesign (Novice) on May 08, 2014 at 19:26 UTC
    Any thoughts on the expanded details I provided? I looked at the IPC page but nothing jumped out for what I can use here.