Dear Monks,

I observe 'rsh <defunct>' processes appear when I run the perl code below. Now, I don't know if this is normal behavior, and even if it is abnormal behavior I am not certain it causes any problems. I am hoping you monks can tell me (1) if this is Normal/Abnormal behavior, (2) why these 'rsh <defunct>' processes occur, and (3) if it this could cause problems (in particular, could this lead to the parent losing track of the grandchildren)?

The goals and explanation of my code are the following: (1) Run 24 copies of a non-perl program on 24 remote nodes (computers). (2) When these 24 independent programs finish, the perl code recognizes this, does some analysis of the output, and then runs 24 more copies of the non-perl program on the 24 remote nodes. This process is repeated over-and-over.

Towards this end my perl code is set up in the following way: I do 24 forking commands (set in a loop), that make a system call that sends a 'rsh' command to the remote nodes, telling them to run the remote program. So my perl script is the 'parent', after the fork the system call is the 'child' and the rsh command to the remote node spawns a 'grandchild'. I use 'system' instead of 'exec' so that the child process waits for the grandchild to finish.

Here is the essence of my perl code (the original code is much longer):

#!/usr/bin/perl use IO::File; use POSIX ":sys_wait_h"; open(JUNKD,">test_rsh-commands.txt"); for($j=1;$j<=10;$j++) # Number of time to resubmit the 24 subprocesses { # Phase 1: Setup phase to spawn jobs &spawn_jobs(1,handle_child); for($i=1;$i<=24;$i++) # BEGIN: Number of subprocesses { $proc = "sp" . "$i"; # remote node to run command on $cmd = "date"; # simple test command # Phase 2: Spawn the jobs &spawn_jobs(2,$proc,$cmd,1,2,3); } # END: loop # Phase 3: Wait for the jobs to finish &spawn_jobs(3,26); close JUNKD; } ## BEGIN: Spwan children jobs on slave nodes ## sub spawn_jobs { my @a=@_; my $phase,$i,$proc,$nt,$mc,$um,$sleep,$sub; $phase = $a[0]; if($phase==1) { $sub = $a[1] } elsif($phase==2) { ($proc,$cmd,$nt,$mc,$um) = @a[1..5] } elsif($phase==3) { $sleep = $a[1] } if($phase==1) # Setup phase { # set up child signal handler $SIG{'CHLD'} = \&$sub; $|++; %fhlist; %fhlist2; %fhlist3; } elsif($phase==2) # Spawn the jobs phase { # Create an anonymous file handle $pid = fork(); if($pid < 0 or not defined $pid) { print LOG "$#-> Can't fork! Bad kernel!"; close LOG; die "$#-> Can't fork! Bad kernel!"; } elsif($pid == 0) { # child process print JUNKD "/usr/bin/rsh $proc $cmd\n"; # system("/usr/bin/rsh $proc $cmd"); # I'm commmenting out the above line, since not everyone # has 24 remote nodes to run on. # system("$cmd"); exec("$cmd"); exit(0); } else { # Parent process, toss child file handle into the hash and move +on with # our lives. $fhlist{"$pid"} = $nt; $fhlist2{"$pid"} = $mc; $fhlist3{"$pid"} = $um; } } elsif($phase==3) # Wait till the children are done phase { while(1) { @kl = keys(%fhlist); if($#kl >= 0) { # mo' to do... sleep($sleep); } else { last; } } } } ### END: Spwan children jobs on slave nodes ## sub handle_child { # This gets called when a child dies... maybe more than one # died at the same time, so it's best to do this in a loop my $temp, $mcopy, $umbr, $nbias, $nmat; while(($dead_kid = waitpid(-1, WNOHANG)) > 0) { $temp = $fhlist{"$dead_kid"}; # get the file descriptor back $mcopy = $fhlist2{"$dead_kid"}; $umbr = $fhlist3{"$dead_kid"}; delete($fhlist{"$dead_kid"}); delete($fhlist2{"$dead_kid"}); delete($fhlist3{"$dead_kid"}); } }

Here is the evidence that 'rsh <defunct>' processes are occuring on the node where the parent is running (Please note the following when interpreting the data below: (A) The defunct processes appear only on the node where the parent perl script is running, (B) 'mubrex_mpi_biow' is the name of the perl script, and (C) for the sake of brevity this data is from the case when 13 subprocesses are run, NOT 24 as in the code above):

p243~/>ps -u user PID TTY TIME CMD 10319 ? 00:00:00 tcsh 10320 ? 00:00:00 pbs_demux 10341 ? 00:00:00 439291.biobos.S 10367 ? 00:02:09 mubrex_mpi_biow 20933 ? 00:00:00 mubrex_mpi_biow 20934 ? 00:00:00 rsh 20935 ? 00:00:00 mubrex_mpi_biow 20936 ? 00:00:00 rsh 20937 ? 00:00:00 mubrex_mpi_biow 20938 ? 00:00:00 rsh 20939 ? 00:00:00 mubrex_mpi_biow 20940 ? 00:00:00 rsh 20941 ? 00:00:00 mubrex_mpi_biow 20942 ? 00:00:00 rsh 20944 ? 00:00:00 mubrex_mpi_biow 20946 ? 00:00:00 mubrex_mpi_biow 20947 ? 00:00:00 rsh 20948 ? 00:00:00 mubrex_mpi_biow 20949 ? 00:00:00 rsh 20950 ? 00:00:00 mubrex_mpi_biow 20951 ? 00:00:00 rsh 20952 ? 00:00:00 mubrex_mpi_biow 20953 ? 00:00:00 rsh 20954 ? 00:00:00 mubrex_mpi_biow 20955 ? 00:00:00 rsh 20956 ? 00:00:00 mubrex_mpi_biow 20958 ? 00:00:00 mubrex_mpi_biow 20945 ? 00:00:00 rsh 20957 ? 00:00:00 rsh 20959 ? 00:00:00 rsh 20960 ? 00:00:00 rsh <defunct> 20961 ? 00:00:00 rsh <defunct> 20962 ? 00:00:00 rsh <defunct> 20963 ? 00:00:00 tcsh 20964 ? 00:00:00 rsh <defunct> 20965 ? 00:00:00 rsh <defunct> 20968 ? 00:00:00 rsh <defunct> 20969 ? 00:00:00 rsh <defunct> 20972 ? 00:00:00 rsh <defunct> 20973 ? 00:00:00 rsh <defunct> 20974 ? 00:00:00 rsh <defunct> 20976 ? 00:00:00 rsh <defunct> 20978 ? 00:00:00 rsh <defunct> 20980 ? 00:00:00 rsh <defunct>

Thanks! Ed


In reply to rsh <defunct> processes appear when using fork and system calls by whatwhat

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.