cyadav has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks! need your help in finding the problem in the code.


For a set of hosts : for each hosts

1. I'm forking a child process, creating a pipe

2. writing end of the pipe (file handle) is passed to child

3. Reading end of pipe is added to select object in parent


In parent, I'm waiting for number of child processes to fall below max processes allowed and also reading from the file handles which are readable ( by accessing select objects' can_read method ).

Data read is merged in a hash ref.

But, after all childs are finished and data is dumped from hash ref which should have all child's data, sometimes it has data from all the hosts but sometimes data from some hosts are missed.


Need your help in finding what I'm doing wrong. Here's the code.

#!/usr/bin/env perl use warnings; use strict; use Getopt::Long; use POSIX ":sys_wait_h"; use IO::Select; use IO::Pipe; use Fcntl; use Data::Dumper; my $h_r; my $c_max; my $sel = IO::Select->new(); sub sel_can_read { # args : timeout my ($t) = @_; # select-object($sel), hashref($h_r) # @rh, ready handles. $h, handle while ( my @rh = $sel->can_read($t) ) { foreach my $h (@rh) { my $o = join ('' , $h->getlines()); my $_h = eval($o); $h_r = _merge_hash($_h, $h_r); # only one read per handle, so close $sel->remove($h); $h->close; } } } sub wait_num_child_below_max { my ( $child_aref, $c_m ) = @_; while ( scalar(@{$child_aref}) > $c_m ) { foreach my $c ( @{$child_aref} ) { @{$child_aref} = grep { $_ ne $c } @{$child_aref} if ( + waitpid($c, WNOHANG) < 0 ); # read child's fh sel_can_read(0); # non-blocking } } return; } sub _merge_hash { # args : array of hashref1, hashref2 to be merged, # returns merge hashref my $h_r; foreach (@_) { if ( defined($h_r) ) { $h_r = { %$h_r, %$_ }; } else { $h_r = $_; } } return $h_r; } sub do_some_stuff_for_host { my ($fh, $host) = @_; my %h; # do some stuff, populate (hash) %h print $fh Data::Dumper->new([\%h])->Terse(1)->Dump; } sub print_report { print STDOUT Data::Dumper->new([$h_r])->Terse(1)->Dump; } sub main { my @a = ('host1', 'host2'); my @c_p; foreach my $i (@a) { wait_num_child_below_max(\@c_p, $c_max); my $pipe = IO::Pipe->new(); my $pid = fork(); if ( $pid ) { # parent push @c_p, $pid; my $fh = $pipe->reader(); $fh->blocking(0); # non-blocking read $sel->add($fh); } elsif ( $pid == 0) { #child my $cfh = $pipe->writer(); do_some_stuff_for_host(\*$cfh,$i); exit (0); } } # leave no child behind wait_num_child_below_max(\@c_p, 0); print_report(); } main();

Replies are listed 'Best First'.
Re: IO::Select, IO::Pipe, fork, Data loss
by RichardK (Parson) on Aug 05, 2015 at 12:52 UTC

    Modifying a list you're using in a for loop is a really bad idea.

    foreach my $c ( @{$child_aref} ) { @{$child_aref} = grep { $_ ne $c } @{$child_aref} if ( + waitpid($c, WNOHANG) < 0 );

    The help in perlsyn says this :-

    If any part of LIST is an array, "foreach" will get very confused if y +ou add or remove elements within the loop body, for example with "splice" +. So don't do that.

      Ok, I've corrected that, but I think that's not going to resolve the problem.

        There's only one way to find out, stop guessing and test it ;)

Re: IO::Select, IO::Pipe, fork, Data loss
by Anonymous Monk on Aug 05, 2015 at 17:08 UTC

    Bunch of don'ts:

    Don't use ->getlines when using select, use sysread instead.

    Don't use non-blocking, it's not needed.

    Don't forget to check eval for failure - especially from a partial (non-blocking) read.

    Don't forget to define $c_max :)

    Don't forget about $child_pid = open(my $fh, "-|") self-forking open if you are just reading from a child.

      #!/usr/bin/perl # http://perlmonks.org/?node_id=1137500 use IO::Select; use YAML; use strict; use warnings; $| = 1; my $sel = IO::Select->new; my (%hash, %id); my @a = map "host$_", 1..9; my $max = 4; while(@a or $sel->count) { if(@a and $sel->count < $max) { my $host = shift @a; if(open my $fh, '-|') { $id{$fh} = $host; $sel->add($fh); $hash{$id{$fh}} .= "starting $host "; print "running ", $sel->count, "\n"; } else { print "hello from the $host child"; exit; } } elsif($sel->count) { for my $fh ($sel->can_read) { if(0 < sysread $fh, my $buf, 1024) { $hash{$id{$fh}} .= $buf; } else { $sel->remove($fh); close $fh; } } } } print Dump \%hash;

      See, simpler with forked open instead of IO::Pipe when only reading from the child :)

Re: IO::Select, IO::Pipe, fork, Data loss
by anonymized user 468275 (Curate) on Aug 05, 2015 at 15:53 UTC
    can_read won't change it's return value, so the while loop testing it will execute 0 or infinity times.

    One world, one people

Re: IO::Select, IO::Pipe, fork, Data loss
by Anonymous Monk on Aug 05, 2015 at 16:50 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1137500 use IO::Select; use IO::Pipe; use YAML; use strict; use warnings; $| = 1; my $sel = IO::Select->new; my (%hash, %id); my @a = map "host$_", 1..9; my $max = 4; while(@a or $sel->count) { if(@a and $sel->count < $max) { my $host = shift @a; my $pipe = IO::Pipe->new; if(my $pid = fork) { my $fh = $pipe->reader; $id{$fh} = $host; $sel->add($fh); $hash{$id{$fh}} .= "starting $host "; print "running ", $sel->count, "\n"; } elsif(defined $pid) { print {$pipe->writer} "hello from the $host child"; exit; } else { die "fork failed: $!"; } } elsif($sel->count) { for my $fh ($sel->can_read) { if(0 < sysread $fh, my $buf, 1024) { $hash{$id{$fh}} .= $buf; } else { $sel->remove($fh); close $fh; } } } } print Dump \%hash;
A reply falls below the community's threshold of quality. You may see it by logging in.