whatwhat has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have the following problem. I have 24 subprocesses I wish to start on 24 different remote nodes computers. I can successfully do this using a perl fork command. However, for debugging purposes I wish to see the commands that the child processes are issued to start. I try and do this by issuing a simple 'print' statement with the issued command to a log file. The problem is this, I do 24 forks, yet a maximum of 18 commands are printed to the log file... I thought there should be 24 commands printed, since I do 24 forks! Even more surprising to me, is that all 24 processes still run correctly.

What is going on? Here is my forking code, you can see the 'print' statement in the 'child process' section:

if($phase==1) # Setup phase { # set up child signal handler $SIG{'CHLD'} = \&$sub; $|++; %fhlist; %fhlist2; %fhlist3; } elsif($phase==2) # Spawn the jobs phase { # Create an anonymous file handle $pid = fork(); if($pid < 0 or not defined $pid) { print LOG "$#-> Can't fork! Bad kernel!"; close LOG; die "$#-> Can't fork! Bad kernel!"; } elsif($pid == 0) { # child process print JUNKD "/usr/bin/rsh $proc $cmd\n"; system("/usr/bin/rsh $proc $cmd"); exit(0); } else { # Parent process, toss child file handle into the hash and move +on with # our lives. $fhlist{"$pid"} = $nt; $fhlist2{"$pid"} = $mc; $fhlist3{"$pid"} = $um; } } elsif($phase==3) # Wait till the children are done phase { while(1) { @kl = keys(%fhlist); if($#kl >= 0) { # mo' to do... sleep($sleep); } else { last; } } }
Any ideas or suggestions would be greatly appreciated! Thanks!

Replies are listed 'Best First'.
Re: Cannot print child process command during fork
by ikegami (Patriarch) on Aug 02, 2007 at 04:46 UTC

    It's hard to see why you get 18 vs 24 when the code you provided only creates one child. Could you provide some runnable code?

    By the way,

    system("/usr/bin/rsh $proc $cmd"); exit(0);

    is an expensive way of doing

    exec("/usr/bin/rsh $proc $cmd"); exit(0); # In case exec fails.

    Using system will needlessly fork a second time.

      I solved my problem, all I needed to do was move the print statement before the fork. It prints out all the commands now. I'll try the exec function, and make sure it works.

      I only included the forking subroutine in the previous post. For the sake of completeness here is a condensed (runnable) version of my code:

      #!/usr/bin/perl use IO::File; use POSIX ":sys_wait_h"; open(JUNKD,">test_rsh-commands.txt"); # Phase 1: Setup phase to spawn jobs &spawn_jobs(1,handle_child); for($i=1;$i<=24;$i++) # BEGIN: loop { $proc = "sp" . "$i"; # remote node to run command on $cmd = "date"; # simple test command # Phase 2: Spawn the jobs &spawn_jobs(2,$proc,$cmd,1,2,3); } # END: loop # Phase 3: Wait for the jobs to finish &spawn_jobs(3,26); close JUNKD; ## BEGIN: Spwan children jobs on slave nodes ## sub spawn_jobs { my @a=@_; my $phase,$i,$proc,$nt,$mc,$um,$sleep,$sub; $phase = $a[0]; if($phase==1) { $sub = $a[1] } elsif($phase==2) { ($proc,$cmd,$nt,$mc,$um) = @a[1..5] } elsif($phase==3) { $sleep = $a[1] } if($phase==1) # Setup phase { # set up child signal handler $SIG{'CHLD'} = \&$sub; $|++; %fhlist; %fhlist2; %fhlist3; } elsif($phase==2) # Spawn the jobs phase { # Create an anonymous file handle print JUNKD "/usr/bin/rsh $proc $cmd\n"; $pid = fork(); if($pid < 0 or not defined $pid) { print LOG "$#-> Can't fork! Bad kernel!"; close LOG; die "$#-> Can't fork! Bad kernel!"; } elsif($pid == 0) { # child process # system("/usr/bin/rsh $proc $cmd"); # I'm commmenting out the above line, since not everyone # has 24 remote nodes to run on. # system("$cmd"); exec("$cmd"); exit(0); } else { # Parent process, toss child file handle into the hash and move +on with # our lives. $fhlist{"$pid"} = $nt; $fhlist2{"$pid"} = $mc; $fhlist3{"$pid"} = $um; } } elsif($phase==3) # Wait till the children are done phase { while(1) { @kl = keys(%fhlist); if($#kl >= 0) { # mo' to do... sleep($sleep); } else { last; } } } } ### END: Spwan children jobs on slave nodes ## sub handle_child { # This gets called when a child dies... maybe more than one # died at the same time, so it's best to do this in a loop my $temp, $mcopy, $umbr, $nbias, $nmat; while(($dead_kid = waitpid(-1, WNOHANG)) > 0) { $temp = $fhlist{"$dead_kid"}; # get the file descriptor back $mcopy = $fhlist2{"$dead_kid"}; $umbr = $fhlist3{"$dead_kid"}; delete($fhlist{"$dead_kid"}); delete($fhlist2{"$dead_kid"}); delete($fhlist3{"$dead_kid"}); } }
      I supplied a simple command (date), but in my full code I am issuing commands to run other programs on the remote node.

      Thanks for everyone's input!

Re: Cannot print child process command during fork
by NetWallah (Canon) on Aug 02, 2007 at 04:47 UTC
    Forking child process, only to run a brief system command seems pointless, because system already does a fork.

    To answer your question, simultanious writing to the same file handle by so many child processes probably gets them to clobber each other occasionally.

    I would recommend getting rid of the fork entirely - simply put 24 system calls in a loop.

         "An undefined problem has an infinite number of solutions." - Robert A. Humphrey         "If you're not part of the solution, you're part of the precipitate." - Henry J. Tillman

        Since no useful work appears to be done at the conclusion of the forked child processes, and $proc and $cmd were unspecified, I assumed that $cmd was a brief command that started a remote process, and exited. If this is the case, the difference between serial and parallel processing would not be significant.

        So, point taken. I should have clarified my assumptions.

             "An undefined problem has an infinite number of solutions." - Robert A. Humphrey         "If you're not part of the solution, you're part of the precipitate." - Henry J. Tillman

      In my code the system command calls another program to run on the remote node. The program on the remote node can take several hours to run. So I think I need the explicit perl 'fork' command, so that I can start, for example, 24 independent calculations on 24 different compute nodes (computers) all running at the same time.

      I think a loop through system calls would run them serially. Even if system exited without waiting for the program to finish this would be undersirable for my needs. I need the parent process to monitor when the child (the program on the remote node) is finished.

      Thanks for the explanation of 'clobbering'!