in reply to Processes clobbering each other

As far as I can see, all your processes funnel into the same pipe. This means that their output will be intermingled, and I mean really intermingled, not just on a line by line basis. So if one program says "aaa\nbbb\n" and the other says "ccc\nddd\n", in principle you may get "aaca\nbcbb\nc\nddd\n".

Depending on the OS, often only one message can be in transit on a pipe at any one time and if the output is for example linebuffered and will fit totally in the pipe you often don't notice this possibility.

But if output blobs can get big, or are blockbuffered, some other messages might succeed in mixing in. I suspect that this is what is happening here, and if you study your arrays carefully enough, you will find the missing hits as strings in the @line array (or not at all if they got added to an Error).

Several solutions are possible. E.g. if you control the target programs tightly enough and all lines are shorter than 512 bytes, you can make sure to always flush them. Or you can set up multiple pipes and collect from them using select() or poll(), though you'll have to collect lines yourself in that case (things like POE can help there). Or you can redirect all output to a file per program, and process these files when the programs are done.

PS, notice that you should also check for the defined-ness of $pid to make sure the implied fork worked. Also, your child pipe setup is overly complicated since you don't really use these pipes (currently). You can use a fork/exec there.

update.

Here is an example of a multi-pipe select based solution:

#!/usr/bin/perl -w use strict; use IO::Select; use constant READ_SIZE => 8192; my @program = qw(cat /etc/passwd); my $select = IO::Select->new(); my %collect_line; for (1..4) { open(my $fh, "-|", @program) || die "Could not start @program: $!" +; $select->add($fh); $collect_line{$fh} = ""; } sub line { my $line = shift; # Do your per line processing here print "Process $line"; } while (%collect_line) { for my $fh ($select->can_read) { my $rc = sysread($fh, $collect_line{$fh}, READ_SIZE, length($collect_line{$fh})); die "Read error $! from pipe ??" if !defined($rc); if ($rc) { line($1) while $collect_line{$fh} =~ s/^(.*\n)//; } else { # EOF line($collect_line{$fh}) if $collect_line{$fh} ne ""; delete $collect_line{$fh}; $select->remove($fh); close($fh); die "Unexpected $? returncode from @program\n" if $?; } } }

Replies are listed 'Best First'.
Re^2: Processes clobbering each other (atoms)
by tye (Sage) on Nov 24, 2003 at 23:14 UTC
    So if one program says "aaa\nbbb\n" and the other says "ccc\nddd\n", in principle you may get "aaca\nbcbb\nc\nddd\n".

    No, that is guaranteed not to happen. Unix pipes will not break up a single write(2) request to a pipe if it is smaller than the system buffer size for pipes (at least 512 bytes, perhaps more like 4kB). So unless you somehow manager to take more than one write(2) to output the "aaa\n", then it will not get any other data interleaved inside of it.

    To get Perl to use more that one write(2) when outputting "aaa\n" you'd have to set $| to a true value and use more than one Perl statement to output those 4 characters.

                    - tye
      Sure, that's why I later on say that linebuffered output of less than 512 bytes is safe. I was just explaining the concept and (over)simplifying things a bit.

      Notice by the way that when writing STDOUT to a tty perl is linebuffered and print "aaa\nbbb\n" will in fact become two writes, even without setting $|. But the real problem of course is the target program becoming blockbuffered (most likely with a blocksize fitting in a pipe which nowadays usually are 4K), but the block not ending on a lineboundary.

      To get Perl to use more that one write(2) when outputting "aaa\n" you'd have to set $| to a true value and use more than one Perl statement to output those 4 characters.
      Though, there is a very distinct chance that this is exactly what is happening. Hard to say for sure. So, I tried commenting out the
      $| = 1;
      line. Didn't work though.
Re: Re: Processes clobbering each other
by mcogan1966 (Monk) on Nov 25, 2003 at 15:08 UTC
    Oooh, that was bad. The results are even worse. I'm lucky if I get even one engine to return a hit count, and I'm not getting ANYTHING else from the output. I think I'm going to stick with the previous design for calling the sub-programs, as it does at least make the calls and returns MOST of the data.