Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that is currently repeating several processes over and over and writing the results to a file. I want to speed this up by forking processess and having them all write to the same file via an open append filehandle. However, I am afraid of clobbering some output to the file with the children. Is this an issue I need to be worried about? When everything prints to stdin it appears to not clobber any of the output.

If this is an issue how can I get around it? I looked into flock, but all children inherit the parent's lock on a fctnl platform (I am using linux)

Any help is greatly appreciated!

Replies are listed 'Best First'.
Re: Forking and writing to files
by tachyon (Chancellor) on May 13, 2004 at 02:23 UTC

    You may find this Forklist package handy which I use to process large lists in parallel. Assuming you have a file(s) you want to process in parallel and safely append data from the kids to output file(s) it provides the infrastructure.

    use ForkList; our $URLS; for $list(@LISTS) { $URLS = get_lines($list); run( \&callback, scalar(@$URLS), { kids => $KIDS, verbose => $VERB +OSE } ); } sub callback { my ( $child, $i ) = @_; my $url = $URLS->[$i]; my( $code ) = cache_url( $url, $FORCE, $TIMEOUT ); append_file( "$LIST.$code", "$url\n", $child, $i, $VERBOSE ); return 0; }

    Here is the package.

    package ForkList; use strict; $|++; use POSIX qw[ WNOHANG ]; use Fcntl ':flock'; use Time::HiRes 'usleep'; use vars qw ( @ISA @EXPORT $VERSION ); $VERSION = "0.01"; require Exporter; @ISA = qw(Exporter); @EXPORT = qw( run get_lines append_file ); # this function allows you to parralel process N lines of data stored # in an array using N kids (default 25). It makes a callback to the # caller with args ( CHILD_NUM, INDEX, DBH ). # The DBH is optional and needs to be passed as an option # { dbh => \&connect_db } # when passed the DBH option each kid will have its own DBH to play wi +th sub run { my ( $CALLBACK, $LINES, $OPTS ) = @_; my $KIDS = $OPTS->{kids} ? $OPTS->{kids} : 25; my $VERBOSE = $OPTS->{verbose} ? $OPTS->{verbose} : 0; my $DBH = $OPTS->{dbh} ? $OPTS->{dbh} : 0; my $SLEEP = $OPTS->{sleep} ? $OPTS->{sleep} : 0; my $start = time(); # we only want as many kids as there are lines as a max $KIDS = $LINES if $KIDS > $LINES; for my $child ( 0 .. $KIDS - 1 ) { my $pid = fork(); defined $pid or do { warn 'Fork Failed!'; sleep 1; next }; if ($pid == 0) { # Child Process warn "Starting child $child of $KIDS\n" if $VERBOSE; # give each kid its own database handle if required my $dbh = &$DBH if $DBH; # make sure we disconnect from db no matter how we exit local $SIG{TERM} = sub { $dbh->disconnect() if $dbh; warn "Kid $child Killed!\n"; exit 1 }; local $SIG{INT} = $SIG{TERM}; # get every nth page from the array depending on child id for ( my $i = $child ; $i < $LINES; $i += $KIDS ) { warn "Child $child is is processing [$i]\n" if $VERBOS +E > 1; # catch any errors that occur during callback eval{ &$CALLBACK( $child, $i, $dbh ) }; $SLEEP && usleep($SLEEP); } warn "Child $child has finished and is exiting\n" if $VER +BOSE; $dbh->disconnect() if $dbh; exit 0; # prevent child from spawning more } } warn 'Waiting for children...' if $VERBOSE; usleep(500) until waitpid( -1, &WNOHANG ) == -1; warn sprintf "Done in %d seconds!\n", time()-$start if $VERBOSE; } sub get_lines { my $file = shift; open F, $file or die "Can't open $file $!\n"; chomp( my @lines = <F> ); close F; return \@lines; } # this function allows kids to safely append to a file # we are using locking to avoid race conditions sub append_file { my ( $file, $data, $child, $i, $VERBOSE ) = @_; open OUT, ">>$file" or die "Can't write $file $!\n"; warn "Child $child waiting for lock at $i\n" if $VERBOSE && $VERBO +SE > 2; flock(OUT, LOCK_EX) or die "Child $child could not lock at $i $!\n +"; seek OUT,0,2; # make sure we are EOF before appending print OUT $data; close OUT; # will release lock } 1;

    cheers

    tachyon

Re: Forking and writing to files
by NetWallah (Canon) on May 13, 2004 at 04:34 UTC
    I'd strongly recommend using "threads" and Thread::Queue.

    This provides a clean OO interface and simplifies your programming, reduces debugging and maintainance headaches.

    Basically, you would start off a FileOutput thread, that reads the queue, and outputs to the file. Other worker threads write into the queue. They have no knowledge of the file. Very simple, clean interface.

    Offense, like beauty, is in the eye of the beholder, and a fantasy.
    By guaranteeing freedom of expression, the First Amendment also guarntees offense.
Re: Forking and writing to files
by coec (Chaplain) on May 13, 2004 at 02:09 UTC
    You could try adding a non-blocking wait and looping until each child successfully writes to the output file.

    I dimly recall using semaphores to over come a similar problem when I did some C programming about 12 years ago.

    CC

Re: Forking and writing to files
by Zaxo (Archbishop) on May 13, 2004 at 08:39 UTC

    You might get some ideas from an old thing of mine, Many-to-One pipe. Just let the child append to the file.

    After Compline,
    Zaxo

Re: Forking and writing to files
by pbeckingham (Parson) on May 13, 2004 at 02:04 UTC

    You'll need to close that file as well as lock it. If you can change your program so that it writes larger chunks less often, then this will become less of an issue.