IO::Select, IO::Pipe, fork, Data loss

cyadav has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks! need your help in finding the problem in the code.

For a set of hosts : for each hosts

1. I'm forking a child process, creating a pipe

2. writing end of the pipe (file handle) is passed to child

3. Reading end of pipe is added to select object in parent

In parent, I'm waiting for number of child processes to fall below max processes allowed and also reading from the file handles which are readable ( by accessing select objects' can_read method ).

Data read is merged in a hash ref.

But, after all childs are finished and data is dumped from hash ref which should have all child's data, sometimes it has data from all the hosts but sometimes data from some hosts are missed.

Need your help in finding what I'm doing wrong. Here's the code.

#!/usr/bin/env perl

use warnings;
use strict;
use Getopt::Long;
use POSIX ":sys_wait_h";
use IO::Select;
use IO::Pipe;
use Fcntl;
use Data::Dumper;

my $h_r;
my $c_max;
my $sel = IO::Select->new();

sub sel_can_read {
    # args : timeout
    my ($t) = @_;

    # select-object($sel), hashref($h_r)
    
    # @rh, ready handles. $h, handle
    while ( my @rh = $sel->can_read($t) ) {
        foreach my $h (@rh) {
            my $o = join ('' , $h->getlines());
            my $_h = eval($o);
            $h_r = _merge_hash($_h, $h_r);
            # only one read per handle, so close
            $sel->remove($h);
            $h->close;
        }
    }
}

sub wait_num_child_below_max {

        my ( $child_aref, $c_m ) = @_;
        
        while ( scalar(@{$child_aref}) > $c_m ) {
            foreach my $c ( @{$child_aref} ) {
                @{$child_aref} = grep { $_ ne $c } @{$child_aref} if (
+ waitpid($c, WNOHANG) < 0 );
                # read child's fh
                sel_can_read(0);        # non-blocking
            }
        }
        return;
}


sub _merge_hash {
    # args : array of hashref1, hashref2 to be merged,
    # returns merge hashref
    
    my $h_r;
    foreach (@_) {
        if ( defined($h_r) ) {
            $h_r = { %$h_r, %$_ };
        } else {
            $h_r = $_;
        }
    }
    return $h_r;
}

sub do_some_stuff_for_host {
    my ($fh, $host) = @_;
    my %h;
    # do some stuff, populate (hash) %h 
    print $fh Data::Dumper->new([\%h])->Terse(1)->Dump;
}

sub print_report {
    print STDOUT Data::Dumper->new([$h_r])->Terse(1)->Dump;
}


sub main {
    my @a = ('host1', 'host2');
    my @c_p;
    
    foreach my $i (@a) {
        wait_num_child_below_max(\@c_p, $c_max);
        
        my $pipe = IO::Pipe->new();
        
        my $pid = fork();

        if ( $pid ) {
            # parent
            push @c_p, $pid;
            my $fh = $pipe->reader();
            $fh->blocking(0);    # non-blocking read
            $sel->add($fh);
        } elsif ( $pid == 0) {
            #child
            my $cfh = $pipe->writer();
            do_some_stuff_for_host(\*$cfh,$i);
            exit (0);
        }
    }    
    
    # leave no child behind
    wait_num_child_below_max(\@c_p, 0);
    
    print_report();
}

main();
[download]

Comment on IO::Select, IO::Pipe, fork, Data loss Download Code

Replies are listed 'Best First'.
Re: IO::Select, IO::Pipe, fork, Data loss by RichardK (Parson) on Aug 05, 2015 at 12:52 UTC
Modifying a list you're using in a for loop is a really bad idea. `foreach my $c ( @{$child_aref} ) { @{$child_aref} = grep { $_ ne $c } @{$child_aref} if ( + waitpid($c, WNOHANG) < 0 );` [download] The help in perlsyn says this :- `If any part of LIST is an array, "foreach" will get very confused if y +ou add or remove elements within the loop body, for example with "splice" +. So don't do that.` [download]	[reply] [d/l] [select]
Re^2: IO::Select, IO::Pipe, fork, Data loss by cyadav (Initiate) on Aug 05, 2015 at 13:22 UTC
Ok, I've corrected that, but I think that's not going to resolve the problem.	[reply]
Re^3: IO::Select, IO::Pipe, fork, Data loss by RichardK (Parson) on Aug 05, 2015 at 15:43 UTC
There's only one way to find out, stop guessing and test it ;)	[reply]
Re: IO::Select, IO::Pipe, fork, Data loss by Anonymous Monk on Aug 05, 2015 at 17:08 UTC
Bunch of don'ts: Don't use ->getlines when using select, use sysread instead. Don't use non-blocking, it's not needed. Don't forget to check eval for failure - especially from a partial (non-blocking) read. Don't forget to define $c_max :) Don't forget about $child_pid = open(my $fh, "-\|") self-forking open if you are just reading from a child.	[reply]
Re^2: IO::Select, IO::Pipe, fork, Data loss by Anonymous Monk on Aug 05, 2015 at 19:39 UTC
#!/usr/bin/perl # http://perlmonks.org/?node_id=1137500 use IO::Select; use YAML; use strict; use warnings; $\| = 1; my $sel = IO::Select->new; my (%hash, %id); my @a = map "host$_", 1..9; my $max = 4; while(@a or $sel->count) { if(@a and $sel->count < $max) { my $host = shift @a; if(open my $fh, '-\|') { $id{$fh} = $host; $sel->add($fh); $hash{$id{$fh}} .= "starting $host "; print "running ", $sel->count, "\n"; } else { print "hello from the $host child"; exit; } } elsif($sel->count) { for my $fh ($sel->can_read) { if(0 < sysread $fh, my $buf, 1024) { $hash{$id{$fh}} .= $buf; } else { $sel->remove($fh); close $fh; } } } } print Dump \%hash; [download] See, simpler with forked open instead of IO::Pipe when only reading from the child :)	[reply] [d/l]
Re: IO::Select, IO::Pipe, fork, Data loss by anonymized user 468275 (Curate) on Aug 05, 2015 at 15:53 UTC
can_read won't change it's return value, so the while loop testing it will execute 0 or infinity times. One world, one people	[reply]
Re: IO::Select, IO::Pipe, fork, Data loss by Anonymous Monk on Aug 05, 2015 at 16:50 UTC
#!/usr/bin/perl # http://perlmonks.org/?node_id=1137500 use IO::Select; use IO::Pipe; use YAML; use strict; use warnings; $\| = 1; my $sel = IO::Select->new; my (%hash, %id); my @a = map "host$_", 1..9; my $max = 4; while(@a or $sel->count) { if(@a and $sel->count < $max) { my $host = shift @a; my $pipe = IO::Pipe->new; if(my $pid = fork) { my $fh = $pipe->reader; $id{$fh} = $host; $sel->add($fh); $hash{$id{$fh}} .= "starting $host "; print "running ", $sel->count, "\n"; } elsif(defined $pid) { print {$pipe->writer} "hello from the $host child"; exit; } else { die "fork failed: $!"; } } elsif($sel->count) { for my $fh ($sel->can_read) { if(0 < sysread $fh, my $buf, 1024) { $hash{$id{$fh}} .= $buf; } else { $sel->remove($fh); close $fh; } } } } print Dump \%hash; [download]	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.