mcogan1966 has asked for the wisdom of the Perl Monks concerning the following question:

I have the following section of code that supposedly opens a bunch of child processes by piping:
$| = 1; $pid = open(first_child, "-|"); if ($pid) { while(<first_child>) { if ($_ =~ m/^Error/) { } #do nothin elsif ($_ =~ m/^Hits/) { push (@hits, $_) ; } else { push (@line, $_); } } } else { my (@child, @cpic); foreach $i(0..@gets) { $e = $gets[$i]; $c = $searches{$e}{handler}; $p = $FORM_DATA{p}; $q = $FORM_DATA{q}; $u = $searches{$e}{url}; $caller = "perl $c $p $q $u"; $cpid[$i] = open($child[$i], "|-"); if ($cpid[$i]) { } # do nothing else { system ($caller); exit; } } foreach (@child) { close($_); } exit; }
Now, this seems to be working properly, as I get all my data just fine. But now I've added something new to each of the called perl programs, a sub-call to another program. The line in each program will look like:
$more = 'perl makemore.pl "$words" "$keywords"';
The program makemore.pl seems to work just fine as it is. My problem is with data being pushed to the @hits array. It seems that when I comment out the perl call in the called programs, I get all my data in @hits just fine. But, with the line activiated, I'm getting some problems with some of the called programs not pushing their data into @hits.

This would be annoying enough, but the problem seems to be unrelated to anything I can see. As the processes are split off, their running time may vary. As such, the order of the results being returned may not always be the same. Another problem is that if I run just one sub-process in the loop above, it does work ok. In fact, I can sometimes run multiple sub-calls. The problem is that in some cases, running multiple sub-calls seems to blow away that one piece of the data. Other data from the sub-calls do come back just fine.

Addendum:

This is no longer a 2-pipe situation, as I've made the change to forking the the child processes off from the first pipe.

Other ideas in here I have tried, but they don't provide any further help. ForkManager isn't an option, as we don't have it installed here.

Also, when I set  $| = 1; I can see the output as it forms line by line. I can see that it's taking a second or 2 to generate the lines of output from @hits. But even then, it's still managing to skip one.

Read This !!!: After significant debugging with the help of someone who is much more foo in those ways than I, it has been discovered that the smashing up of my output has come from corrupted data, nothing more. So, I thank all of you who have given input. In the end, my coding will be better for this. And I have a few new ideas to mull over in my head. End result, the entire process is being redesigned almost completely from scratch to hopefully better account for the poorly formated data.

Again, thanks to all who have responded, please don't hate me for this. :)

Replies are listed 'Best First'.
Re: Processes clobbering each other
by holo (Monk) on Nov 24, 2003 at 21:45 UTC

    First off, try:

    use strict; use warnings 'all';

    Second, you might have a typo in your code at the second regexp condition. Should that read:
    elsif ($_ =~ m/^Hits/) { push (@hits, $_) ; }
    instead of:
    elsif ($_ =~ m/^Hits) { push (@hits, $_) ; } ?

    Finally, what does makemore.pl print out ? Is it working correctly ? Does it work correctly when called multiple times concurrently ?

      First, ok, I'll give that a try, just to see what I get.

      Second, yeah, that's a typo. Sorry.

      And finally, makemore.pl just returns a string. It works just fine, as the output from the string is being used in other output, and it showing up correctly. Hence the reason it's going into a variable when called.

        You are using open and thus the child's output is not being thrown into STDOUT but to a pipe. Closing all pipes without reading the output will not pipe the sub-process output to the parent's parent since they are on different pipes. You should probably use fork to fork(tm) without redirecting output to a different pipe. I managed to confuse myself with that last phrase so here's an idea:

        $| = 1; $pid = open(first_child, "-|"); if ($pid) { while(<first_child>) { if ($_ =~ m/^Error/) { } #do nothin elsif ($_ =~ m/^Hits/) { push (@hits, $_) ; } else { push (@line, $_); } } } else { my @cpic; foreach $i(0..@gets) { $e = $gets[$i]; $c = $searches{$e}{handler}; $p = $FORM_DATA{p}; $q = $FORM_DATA{q}; $u = $searches{$e}{url}; $caller = "perl $c $p $q $u"; $cpid[$i] = fork; unless ($cpid[$i]) { system ($caller); exit; } } exit; }

        Big warning! Untested code. I don't understand the task that you are trying to accomplish but I (hope) can now see what's wrong. This might not run but should give you an idea.

        BTW: Why don't you use ForkManager to accomplish this task ? It simplifies debugging as it makes your code clearer and gives you more control on your child processes.

Re: Processes clobbering each other
by thospel (Hermit) on Nov 24, 2003 at 22:49 UTC
    As far as I can see, all your processes funnel into the same pipe. This means that their output will be intermingled, and I mean really intermingled, not just on a line by line basis. So if one program says "aaa\nbbb\n" and the other says "ccc\nddd\n", in principle you may get "aaca\nbcbb\nc\nddd\n".

    Depending on the OS, often only one message can be in transit on a pipe at any one time and if the output is for example linebuffered and will fit totally in the pipe you often don't notice this possibility.

    But if output blobs can get big, or are blockbuffered, some other messages might succeed in mixing in. I suspect that this is what is happening here, and if you study your arrays carefully enough, you will find the missing hits as strings in the @line array (or not at all if they got added to an Error).

    Several solutions are possible. E.g. if you control the target programs tightly enough and all lines are shorter than 512 bytes, you can make sure to always flush them. Or you can set up multiple pipes and collect from them using select() or poll(), though you'll have to collect lines yourself in that case (things like POE can help there). Or you can redirect all output to a file per program, and process these files when the programs are done.

    PS, notice that you should also check for the defined-ness of $pid to make sure the implied fork worked. Also, your child pipe setup is overly complicated since you don't really use these pipes (currently). You can use a fork/exec there.

    update.

    Here is an example of a multi-pipe select based solution:

    #!/usr/bin/perl -w use strict; use IO::Select; use constant READ_SIZE => 8192; my @program = qw(cat /etc/passwd); my $select = IO::Select->new(); my %collect_line; for (1..4) { open(my $fh, "-|", @program) || die "Could not start @program: $!" +; $select->add($fh); $collect_line{$fh} = ""; } sub line { my $line = shift; # Do your per line processing here print "Process $line"; } while (%collect_line) { for my $fh ($select->can_read) { my $rc = sysread($fh, $collect_line{$fh}, READ_SIZE, length($collect_line{$fh})); die "Read error $! from pipe ??" if !defined($rc); if ($rc) { line($1) while $collect_line{$fh} =~ s/^(.*\n)//; } else { # EOF line($collect_line{$fh}) if $collect_line{$fh} ne ""; delete $collect_line{$fh}; $select->remove($fh); close($fh); die "Unexpected $? returncode from @program\n" if $?; } } }
      So if one program says "aaa\nbbb\n" and the other says "ccc\nddd\n", in principle you may get "aaca\nbcbb\nc\nddd\n".

      No, that is guaranteed not to happen. Unix pipes will not break up a single write(2) request to a pipe if it is smaller than the system buffer size for pipes (at least 512 bytes, perhaps more like 4kB). So unless you somehow manager to take more than one write(2) to output the "aaa\n", then it will not get any other data interleaved inside of it.

      To get Perl to use more that one write(2) when outputting "aaa\n" you'd have to set $| to a true value and use more than one Perl statement to output those 4 characters.

                      - tye
        Sure, that's why I later on say that linebuffered output of less than 512 bytes is safe. I was just explaining the concept and (over)simplifying things a bit.

        Notice by the way that when writing STDOUT to a tty perl is linebuffered and print "aaa\nbbb\n" will in fact become two writes, even without setting $|. But the real problem of course is the target program becoming blockbuffered (most likely with a blocksize fitting in a pipe which nowadays usually are 4K), but the block not ending on a lineboundary.

        To get Perl to use more that one write(2) when outputting "aaa\n" you'd have to set $| to a true value and use more than one Perl statement to output those 4 characters.
        Though, there is a very distinct chance that this is exactly what is happening. Hard to say for sure. So, I tried commenting out the
        $| = 1;
        line. Didn't work though.
      Oooh, that was bad. The results are even worse. I'm lucky if I get even one engine to return a hit count, and I'm not getting ANYTHING else from the output. I think I'm going to stick with the previous design for calling the sub-programs, as it does at least make the calls and returns MOST of the data.
Re: Processes clobbering each other
by iburrell (Chaplain) on Nov 24, 2003 at 22:27 UTC
    One problem I see is that the parent process is reading from all of the child processes through the "first_child" file handle. This makes it easy for output from the children to be mixed together. This is especially true since pipes are block buffered and will tend to split lines. One thing to try is unbuffering the output in the children processes. Then they will write for each line. This may not work if

    Also, why are you using the piped open for the second call? You never write to the stdin of the subprocesses. And there is no point to having a child process that just waits for the system to return. A fork/exec would work just as well.

      In the above respoce from holo, that is exactly what I'm doing now. As for unbuffering, I thought that's what I was doing when I had the line $| = 1;. Am I wrong here?
Re: Processes clobbering each other
by graff (Chancellor) on Nov 25, 2003 at 03:55 UTC
    ... this seems to be working properly, as I get all my data just fine. But now I've added something new to each of the called perl programs, a sub-call to another program. The line in each program will look like:
    $more = 'perl makemore.pl "$words" "$keywords"';
    The program makemore.pl seems to work just fine as it is...

    It's not clear whether you have the right quotes on this added feature. If those are supposed to be backticks around "perl makemore.pl ... ..." (i.e.  qx/perl makemore.pl "$words" "$keywords"/;), then it ought to be okay -- but it looks like you have plain-old single-quotes (apostrophes), which would not work...

      Yes, they are back-tics in that line.

      Sorry that the code tags didn't show that properly.

Re: Processes clobbering each other
by Anonymous Monk on Nov 25, 2003 at 13:14 UTC
    Have you checked that you haven't reached your limit of concurrent processes?
      I'm not seeing any indication that this is the case. I'm only forking off about 6 processes, and each one of them will only have one additional process at any given time. I doubt this is what is happening.

      However, I would like to know how I might go about verifying this. What should I add in to my debug code to check? Thanks for the idea.