delirium has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on some code to run some comm sessions in parallel, and update a database when all the sessions have completed. The database contains things like filenames, last run times, etc.

The problem I'm running into is that I'm reading the database into a hash, and dumping the hash back to the database at the end of the script. The hash copies in each fork don't update the parent's hash, and I'm left with the hash as it was at load time being re-written. Here is some dumbed-down code to illustrate:

#!/usr/bin/perl -w use strict; use Data::Dumper; use Parallel::ForkManager; my $pm = new Parallel::ForkManager(10); $pm->run_on_finish ( sub { my (undef, $exit_code, $ident) = @_; $update_flag = 1 if $exit_code; } ); my %sess_hist = (); &load_database; for my $session (keys %{$hash{Session}}) { my $pid = $pm->start($session) and next; &run_session($session) if &check_overdue($session); # &run_session updates %sess_hist with new stats else { exit (0); } $pm->finish($session); } $pm->wait_all_children; &save_database if $update_flag;

I'd be more than happy to ditch Data::Dumper in favor of a simple database, but the ones I've played with (NDBM_File, SDBM_File, etc.) all seem to do a final untie() at the end to re-write the database, putting me back in the same boat.

What's a good way out of this with the least amount of module installation?

Thanks.

Update:

Wow, that was really a "Bread good, fire bad" moment. Yes, my "database" is nothing more than a Data::Dumper printout. The easy solution ocurred to me on the drive home: Child processes create a new hash of things to change, then re-read the original hash from file, merge the changes, re-save.

Next time: more caffeine, less knee-jerk question posting.

Replies are listed 'Best First'.
Re: Forking and database updates
by mpeppler (Vicar) on Nov 11, 2003 at 22:12 UTC
    If I understand what you are saying correctly you want the data that each child writes out to the "database" to be picked up in the parent when it re-reads the "database". If that is the case then the first thing you need to think of is that the data from the different child processes needs to be merged together so that all the changes are preserved, or write one file per child, and then read that file back in the parent.

    Alternatively, if you want your child process changes to be reflected directly in the parent then you need to use some form of shared memory setup, or by having the parent read data from each of the child processes and update its data that way.

    Michael

Re: Forking and database updates
by Roger (Parson) on Nov 11, 2003 at 22:56 UTC
    Reminds me of a resource serialization problem I had back in the old uni days, where I had to synchronize data between parent/children.

    One of the solutions was to use an external database, where database I/O's were handled by an independent (external) process.

    Another solution was to use shared memory (I was coding it in C then). I think it must be our liz who has posted this module on CPAN (well done liz) - forks::shared. It states that:

    The "forks::shared" pragma allows a developer to use shared variables with threads (implemented with the "forks" pragma) without having to have a threaded perl, or to even run 5.8.0 or higher.

    And example pulled from CPAN -
    use forks; use forks::shared; my $variable : shared; my @array : shared; my %hash : shared; share( $variable ); share( @array ); share( %hash ); lock( $variable ); cond_wait( $variable ); cond_signal( $variable ); cond_broadcast( $variable );
    Take a look at this module, I think this might be a more elegant solution to your problem.

Re: Forking and database updates
by perrin (Chancellor) on Nov 11, 2003 at 22:07 UTC
Re: Forking and database updates
by blue_cowdawg (Monsignor) on Nov 11, 2003 at 21:42 UTC

        The hash copies in each fork don't update the parent's hash, and I'm left with the hash as it was at load time being re-written. Here is some dumbed-down code to illustrate:

    If I understand your statement correctly I would not expect modifications of data within a child to be "felt" back at the parent. When you fork a child it recieves a copy of the parent's data space at fork. When the child terminates any changes to that data space are lost with the child.

    Now, if your parent reads the database after the children update it and doesn't see changes then you have a completely different problem set.


    Peter L. Berghold -- Unix Professional
    Peter at Berghold dot Net
       Dog trainer, dog agility exhibitor, brewer of fine Belgian style ales. Happiness is a warm, tired, contented dog curled up at your side and a good Belgian ale in your chalice.