liverpole has asked for the wisdom of the Perl Monks concerning the following question:

Greetings fellow monks,

I've recently come across some interesting behavior in a script I wrote to monitor a build process.  The actual build command (a top-level make, on RedHat Linux) is spawned as a pipe so the script can monitor it.  At the end of the build, after a tarball is created, the entire source tree is CVS-tagged.  This tagging takes a lot of time, so I then fork a child process to wait for the build to complete, to allow the parent to do other processing.  What's happening, though, is that the parent is somehow getting blocked.

I discovered the problem goes away when the filehandle used for the IO::Select (to read output from the build) is made global, which otherwise is constructed as a deeply-bound lexical variable within the process-monitoring closure.  Can anyone shed some light on why this filehandle is somehow apparently blocking the parent process after the fork, even though the parent never calls the closure again, and thus never interacts with the filehandle?

I've simplified the original program makebuild from 900 lines to 175 lines, and using a script fakebuild to simulate the build process.   When I set the variable $b_use_global_fh to zero on line 14, the parent process says "(1a) Parent about to return", but doesn't show that it has returned from the subroutine with "(2) Parent back from monitor_build()" until much later.  When $b_use_global_fh is set to 1, the parent process returns immediately, and the 2 lines appear successively in the output.

My question is "why would the behavior be different, depending on where how the FileHandle object is scoped?"

Here is the fakebuild script:

#!/usr/bin/perl -w # # Perform a 'fake build' to test the 'makebuild' script. # 060222 liverpole # + # Strict use strict; use warnings; + # Flush output $|++; + # Things to pretend to build my @modules = qw( admin drivers tools ); + # Do a simulated 'make clean' map { unlink("$_.txt") } @modules; # Unlink each module's logfile unlink "cvstag.txt"; # This one's special + # Pretend to build everything sleep 1; foreach (@modules) { print STDERR "Building module '$_' ...\n"; system("touch $_.txt"); print STDERR " - phase 1\n"; sleep 1; print STDERR " - phase 2\n"; sleep 1; print STDERR " - phase 3\n"; sleep 1; } + # Pretend to perform the tagging procedure print STDERR "CVS tagging ...\n"; system("touch cvstag.txt"); for (my $i = 1; $i <= 16; $i++) { sleep 1; print STDERR "Tagging file $i\n"; }

And here is the makebuild script:

#!/usr/bin/perl -w # # Test program to demonstrate the differing behavior when a global # filehandle is used, vs. a deeply-bound lexical in a forked closure. # # 060222 liverpole # # Strict use strict; use warnings; # User-defined my $b_use_global_fh = 0; # Libraries use File::Basename; use FileHandle; use IO::Select; # Declarations sub monitor_build; sub monitor_logfiles; sub system_command; # Globals $| = 1; my $iam = basename $0; my $global_fh; #################### ### Main program ### #################### # Remove the previous build system("rm -f *.txt"); # Construct a global filehandle $global_fh = new FileHandle; # Perform build -- fork occurs within subroutine, and only parent retu +rns print STDERR "Starting build\n"; monitor_build("./fakebuild"); print STDERR "\e[101m(2) Parent back from monitor_build()\e[m\n"; # Parent can now tend to other business... print STDERR "Notifying users of build completion ...\n"; sleep 1; print STDERR "Adding build version to Bugzilla ...\n"; sleep 1; print STDERR "Building .iso images ...\n"; sleep 1; ################### ### Subroutines ### ################### # # Inputs: $1 ... the build command # # Results: Issues the build command to the system and monitors its p +rogress. # sub monitor_build { my ($bld_cmd) = @_; my @logfiles = qw( admin drivers tools cvstag ); my $plogs = { map { "$_" . ".txt", 1 } @logfiles }; my $psyscmd = system_command("$bld_cmd 2>&1"); my $b_finished = 0; while (1) { my $ptext = $psyscmd->(8); last unless $ptext; foreach my $line (@$ptext) { print "Text from build [$line]\n"; } if (!$b_finished && monitor_logfiles($plogs)) { # The build has finished (except for the CVS tagging phase +), # so we fork, to let the parent return and complete other +tasks # (eg. user-notify, adding version number to Bugzilla, etc +.) # The child continues until the tagging is complete. We a +lso # set $b_finished so this block doesn't get executed again +. # my $pid = fork; defined($pid) or die "$iam: unable to fork!\n"; if ($pid) { # Parent print STDERR "\e[101m(1a) Parent about to return\e[m\n +"; return; } # Child print STDERR "\e[102m(1b) Child continues tagging\e[m\n"; $b_finished = 1; } select(undef, undef, undef, 0.250); } print STDERR "\e[102m(3) Child is finished -- exiting\e[m\n"; exit; # The build finished -- no need for the child to continue } # # Inputs: $1 ... a pointer to the hash of logfile names # # Outputs: $1 ... nonzero if CVS tagging has started ("cvstag.txt" h +as # been created), zero otherwise. # # Results: finds any build logfiles have been written to, logs their + names # and date, and removes them from the hash. # sub monitor_logfiles { my ($plogs) = @_; my %modified; foreach my $logfile (keys %$plogs) { (-e $logfile) and $modified{$logfile} = (stat($logfile))[9]; } my $b_cvs_tagging = 0; map { print STDERR "Started log '$_'\n"; delete $plogs->{$_}; ($_ eq 'cvstag.txt') and $b_cvs_tagging = 1; } sort { $modified{$a} <=> $modified{$b} } (keys %modified); return $b_cvs_tagging; } # # Inputs: $1 ... a command to issue to the shell # # Outputs: $1 ... a closure which reads successive lines of process +output # # Results: Opens an output pipe from the given command and returns a + closure # which reads non-blocking text from the pipe and returns a + pointer # to it. The closure takes a single argument -- the maximu +m number # of lines of text to read on each call (0 = unlimited). A + zero is # returned when the command finishes. # sub system_command { my ($cmd) = @_; # Create a pipe to the command my $fh = $b_use_global_fh? $global_fh: new FileHandle; open($fh, "$cmd|") or die "$iam: Cannot pipe to command '$cmd' ($ +!)\n"; # Create a Select object my $select = IO::Select->new(); $select->add($fh); my $b_done_syscmd = 0; # Create the monitoring closure my $psub = sub { my ($maxlines) = @_; $b_done_syscmd and return 0; my @lines; while (1) { last unless $select->can_read(0); defined(my $line = <$fh>) or $b_done_syscmd = 1; last if $b_done_syscmd; chomp $line; push @lines, $line; last if ($maxlines > 0 && @lines == $maxlines); select(undef, undef, undef, 0.250); } return \@lines; }; return $psub; }

@ARGV=split//,"/:L"; map{print substr crypt($_,ord pop),2,3}qw"PerlyouC READPIPE provides"

Replies are listed 'Best First'.
Re: Why is a deeply-bound lexical in a forked closure blocking the parent process?
by Anonymous Monk on Feb 23, 2006 at 20:08 UTC
    This may be full of it, it's a long time since I messed around with this in perl, and Back When I wasn't using pipes this way - I was using files.

    The parent blocks because $psyscmd goes out of scope upon the return in the parent process, and perl tries to destroy the filehandle (because the only living reference to it is from the structure referenced by $psyscmd) but finds its close() blocking on the child.

    In the global case, the parent does not try to close() the file so it doesn't show up, but in the lexical case it does, because perl wants all open files to close as soon as they go out of scope (in case you rely on the behavior and try to open them again).

    The file is (after fork()) held open by two processes, and perl doesn't want to lie to you about the success of close(), nor fail just because someone else is looking at the file, so (apparently) it blocks.

    If you return a reference to $psyscmd it shouldn't block, because $psyscmd doesn't then have to be destroyed, and you can sync up at a later point (like when leaving the program).

    I don't know that you can open a pipe handle with shared semantics, but that might be another way to close your handle without blocking, if in fact you can and perl is smart enough to know that you did (I'm sure perl is smart, it's whether you can share your end of the pipe).

    Or maybe I'm just dead wrong and wasting valuable electrons... I'll be interested to see what others think.

    You should be able to test my hypothesis by issuing an intentional close() on the filehandle in the parent before you return (and putting in another print). If it blocks there, then I'm at least a little bit right.

      Your answer is not only at least a little bit right, it's exactly on target!  Excellent analysis! ++

      I read your response yesterday, and was fairly sure that you had hit the nail on the head.  But I wanted to do it justice by testing further, and as I expected, the program succeeds when another reference is created to the filehandle.

      First, I made sure $b_use_global_fh was zero, and changed system_command() to return both $psyscmd and the filehandle $fh, so I could create a reference to $fh to keep it open:

      my ($psyscmd, $fh) = system_command("$bld_cmd 2>&1"); $global_fh = $fh;
      That worked exactly as you predicted.  It also worked when I simply had the parent process return the filehandle:
      if ($pid) { # Parent print STDERR "\e[101m(1a) Parent about to return\e[m\n"; return $fh; }
      but only, of course, when the reference was preserved after the return from monitor_build:
      my $unused_fh = monitor_build("./fakebuild");

      So I thank you for a very enlightening solution.  It taught me something important about closures; namely, that they close their open filehandles which go out of scope, just the way a program does when it exits.  Nicely written!

      Update:  I just realized (and just tested as correct) that another way to keep the filehandle from attempting to close is for the parent to return the reference to the closure:

      my $psyscmd = system_command("$bld_cmd 2>&1"); # ... if ($pid) { # Parent print STDERR "\e[101m(1a) Parent about to return\e[m\n"; return $psyscmd; }
      Then, as long as it is saved in the main program, this technique works as well.  It also avoids having to fuss with the filehandle directly.

      @ARGV=split//,"/:L"; map{print substr crypt($_,ord pop),2,3}qw"PerlyouC READPIPE provides"
        Hi. This is my first time on perlmonks - I am a co-worker of liverpole's, and he and I have been discussing his question. I hope I am not out-of-line by continuing the discussion.

        (I was close-but-off when we discussed it here at work. I'd surmised it was I/O blocking of some type, but had presumed it was related to code for output that he's since eliminated. It wasn't the problem.)

        I hadn't considered file closures - because to my limited understanding of UNIX file systems (liverpole is using one of the Red Hat Linux systems), file closure was relatively independent from one system process to another. A process on UNIX will close a file, any unwritten buffers will write out, any reference counts will be decremented, and the process will continue. (The kernel may take further action if reference counts reach zero, but that is outside of the process, and irrelevant to it.)

        Clearly, you have demonstrated that there is a coordination of file closures, and that the main parent was blocking on closure at an inconvenient time (at the end of the local scope). It doesn't surprise me, therefore, that moving the file handle or keeping a reference count to the file handle until the program exits will postpone that blocking until it no longer matters. (Other than efficiency, who cares if the parent program blocks during exit, until the child also exits?)

        But - what in the world is causing/permitting the coordination between the two processes? Is it a Perl construct to do so? What if the child, instead of simply forking, performed an exec as well - would that prevent or reduce any coordination?

        I'm truly convinced that there is unexpected coordination of the files on file closure (whether that file closure is implicit or explicit). But I'm not a Frater of Perl enough to know how or why such coordination is taking place?

        Help? Don't worry, liverpole will tell me if anyone replies. Thanks for reading, and tolerating. And answering.

Re: Why is a deeply-bound lexical in a forked closure blocking the parent process?
by ruoso (Curate) on Feb 23, 2006 at 18:01 UTC

    I must recognize that I ++ your post just because of its intriguing title... :)

    daniel