Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

While investigating a perl process, which seemed to be hung, I found that it was waiting in read(2) on a pipe opened to a dead child process spawned using backtick. The child process died a long time ago and was in the zombie state because the parent was stuck in read(2) and hadn't called wait(2) yet. It's easy enough to reproduce. I ran some tests and this is what I found: - it happens only when spawning child processes using backtick; spawning using system seems to work fine. - it happens when the child is a bash/sh script

The parent perl script - parent.pl --------------------------------------- #!/usr/bin/perl print "I AM PARENT\n"; my $x=`/root/a.sh`; #my $x=`/root/b.pl`; #my $x=`/root/a.out`; #my $x=system("/root/a.sh"); print "PARENT EXITING\n"; Child script - a.sh --------------------- #!/bin/sh echo "I am a.sh......." sleep 6000 echo "I am gonna die ........" exit 123 Another child script but which is perl instead of bash - b.pl --------------------------------------------------------------------- #!/usr/bin/perl print "I am b.pl\n"; sleep(6000000); print "I am gonna die...\n";

Now, execute parent.pl, it creates a pipe to the STDOUT of the child process and waits in read(2). Now, kill the child process; the parent still waits in read(2). One would expect that the death of the child process would close the write end of the pipe which would cause read(2) to return 0 thus causing the parent to terminate too. But instead, read(2) returns ERESTARTSYS and resumes waiting.

[root@onong ~]# ps aux | grep a.sh root 23171 0.0 0.0 63860 1084 pts/4 S+ 03:12 0:00 /bin/sh /root/a.sh root 23514 0.0 0.0 61176 804 pts/2 S+ 03:12 0:00 grep a.sh [root@onong ~]# kill 23171 [root@onong ~]# ps aux | grep a.sh root 23171 0.0 0.0 0 0 pts/4 Z+ 03:12 0:00 [a.sh] <defunct> <--------- +-------------- CHILD BECOMES ZOMBIE root 23967 0.0 0.0 61176 804 pts/2 S+ 03:13 0:00 grep a.sh [root@onong ~]# Strace output of the parent process: [root@onong ~]# strace ./test.pl . . . open("./test.pl", O_RDONLY) = 3 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff24ed2260) = -1 ENOTTY (I +nappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 fstat(3, {st_mode=S_IFREG|0755, st_size=160, ...}) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 readlink("/proc/self/exe", "/usr/bin/perl"..., 4095) = 13 brk(0x170a1000) = 0x170a1000 read(3, "#!/usr/bin/perl\n\nprint \"I AM PAR"..., 4096) = 160 read(3, "", 4096) = 0 close(3) = 0 write(1, "I AM PARENT\n", 12I AM PARENT ) = 12 pipe([3, 4]) = 0 pipe([5, 6]) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIG +CHLD, child_tidptr=0x2b61c66692e0) = 23171 <--------- CHILD SPAWNED close(6) = 0 close(4) = 0 read(5, "", 4) = 0 close(5) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff24ed21a0) = -1 EINVAL (I +nvalid argument) lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 read(3, "I am a.sh.......\n", 4096) = 17 read(3, 0x17082210, 4096) = ? ERESTARTSYS (To be restarted) <--------- +----- DOESN"T RETURN 0 --- SIGCHLD (Child exited) @ 0 (0) --- read(3, <----------------- KEEPS WAITING IN READ(2)

Strange thing is that the same doesn't happen if the child process is not a bash script. In parent.pl, comment the line which spawns a.sh and uncomment the line which spawns b.pl, which is a perl script. Run the test again.

[root@onong ~]# ps aux | grep b.pl root 29350 0.0 0.0 77884 1488 pts/4 S+ 03:15 0:00 /usr/bin/perl /root/ +b.pl root 29495 0.0 0.0 61176 756 pts/2 S+ 03:15 0:00 grep b.pl [root@onong ~]# kill 29350 [root@onong ~]# ps aux | grep b.pl root 30028 0.0 0.0 61176 748 pts/2 S+ 03:16 0:00 grep b.pl [root@onong ~]# Strace of parent process: read(3, "#!/usr/bin/perl\n\nprint \"I AM PAR"..., 4096) = 160 read(3, "", 4096) = 0 close(3) = 0 write(1, "I AM PARENT\n", 12I AM PARENT ) = 12 pipe([3, 4]) = 0 pipe([5, 6]) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIG +CHLD, child_tidptr=0x2ba7aabac2e0) = 29350 close(6) = 0 close(4) = 0 read(5, "", 4) = 0 close(5) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff4156e300) = -1 EINVAL (I +nvalid argument) lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 read(3, "", 4096) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- close(3) = 0 rt_sigaction(SIGHUP, {0x1, [], SA_RESTORER, 0x343740e7c0}, {SIG_DFL, [ +], 0}, 8) = 0 rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x343740e7c0}, {SIG_DFL, [ +], 0}, 8) = 0 rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x343740e7c0}, {SIG_DFL, +[], 0}, 8) = 0 wait4(29350, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = +29350 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x343740e7c0}, NULL, 8 +) = 0 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x343740e7c0}, NULL, 8 +) = 0 rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x343740e7c0}, NULL, +8) = 0 write(1, "PARENT EXITING\n", 15PARENT EXITING ) = 15 exit_group(0) = ?

I also ran the test with the child process being a C program. It works fine. So, it would seem that the perl interpreter is doing some special processing for sh/bash scripts?? Thanks.

Replies are listed 'Best First'.
Re: parent process stuck in read(2) on pipe opened to child process using backtick
by Eliya (Vicar) on Feb 14, 2012 at 10:12 UTC

    I cannot reproduce the problem (neither with bash nor with dash).   I get

    ... read(3, "I am a.sh.......\n", 4096) = 17 read(3, "I am gonna die ........\n", 4096) = 24 --- SIGCHLD (Child exited) @ 0 (0) --- read(3, "", 4096) = 0 fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0 close(3) = 0 ...

    (Perl 5.12.4, current versions of dash/bash as they ship with Ubuntu 11.10)

    Update: err wait... you're probably not killing the sleep, but the shell only (in which case the sleep subprocess may survive (depending on the shell settings)).  Try killing sleep directly, or kill the entire process group (use "ps fo pid,pgrp,comm" to find out, then "kill -15 -PGRP").

      You are absolutely right - can't thank you enough. I checked - sleep is holding on to the pipe that's why read(2) doesn't exit. I can get really dumb at times - how hard was that to figure out :) Also, when the child is a perl script instead of bash/sh, killing it takes away the sleep too. That explains why I saw the issue only with bash/sh. Shouldn't bash/sh do the same, i.e., take care of cleaning up it's child processes??? Maybe this is not the right forum for bash/sh but what's the harm in asking :)

        ... Maybe this is not the right forum for bash/sh but what's the harm in asking :)

        Yes, this is not a bash forum, and all in all, bash configuration is a rather complex topic...

        Anyhow, the easiest approach would probably be to add a line

        trap "kill 0" EXIT

        to your a.sh.  This sets up an exit handler which kills the current process group.

        Then run a.sh in a new process group (so you avoid killing the calling Perl script, too):

        my $x=`exec perl -e "setpgrp; exec '/root/a.sh'"`;

        (I'm not aware of any way to create a new process group from within the shell script itself, so I'm using Perl's setpgrp here.)

      Thanks, Guys, for the insights so far. Well, I have been trying to figure out the best way to work-around my original issue, based on the inputs I received, but today I stumbled onto something really weird.

      My perl thread is stuck in read(2) again but this time it is because some other completely unrelated thread has a reference to that pipe.

      [root@onong 5227]# strace -s 1024 -p 5227 Process 5227 attached - interrupt to quit read(28, The fd in question is a pipe: [root@onong tmp]# ls -l /proc/5227/fd total 0 lr-x------ 1 root root 64 Feb 21 02:02 0 -> /dev/null l-wx------ 1 root root 64 Feb 21 02:02 1 -> /tmp/abc.log . . lr-x------ 1 root root 64 Feb 21 02:02 28 -> pipe:[28443] Following are the processes which have a reference to this pipe: [root@onong tmp]# lsof | grep 28443 perl 5049 root 28r FIFO 0,6 + 28443 pipe ntpd 9936 ntp 28r FIFO 0,6 + 28443 pipe ntpd 9936 ntp 29w FIFO 0,6 + 28443 pipe dhcpd 10228 root 28r FIFO 0,6 + 28443 pipe java 15518 root 28r FIFO 0,6 + 28443 pipe

      As you can see, the ntpd process has the write-end of the pipe open!!!!

      Any ideas as to how this is even possible????

        Posting my findings with the hope that it would be of help to someone out there.

        system and backtick both fork a child process. Backtick differs from system in that it opens a pipe for gathering the child process/command’s output. And this is what perl doc on fork has to say about file descriptors:

        Any filehandles open at the time of the fork() will be dup()-ed. Thus, the files can be closed independently in the parent and child, but beware that the dup()-ed handles will still share the same seek pointer. Changing the seek position in the parent will change it in the child and vice-versa. One can avoid this by opening files that need distinct seek pointers separately in the child. On some operating systems, notably Solaris and Unixware, calling exit() from a child process will flush and close open filehandles in the parent, thereby corrupting the filehandles. On these systems, calling_exit() is suggested instead. _exit() is available in Perl through the POSIX module. Please consult your system's manpages for more information on this

        So, whatever file descriptors/handles were open at the time of executing system/backtick are inherited by the child process/command. Which means processes like ntpd/dhcpd/named etc inherit all of the parent perl process' open file descriptors including any pipes opened as part of backtick.

        Somehow ntpd, in my case, also inherited the write-end of the pipe which is the reason the parent process, which had the read-end of the pipe, was stuck in read(2) indefinitely because ntpd is a long running process. Now, the parent process closes the write-end of the pipe before forking but the whole operation of opening the pipe and closing the write-end is not atomic so .......

        perl process

        backtick

        pipe()

        <---------- GAP HERE

        close() write-end of pipe

        The solution I used was to write a perl wrapper script which closes all fds except 0/1/2 and then exec the command, like this:

        `closed.pl service xyz restart`;

        There are multiple ways of closing fds. I used the tips from the following discussion : http://www.perlmonks.org/?node_id=476086

        Here's a link which describes a similar issue : http://tdistler.com/2010/06/18/stop-stealing-my-file-descriptors