learnedbyerror has asked for the wisdom of the Perl Monks concerning the following question:
Oh Monks,
Yet again, I am contemplating going where the wise fear to tread - should I override fork?
I am building a module to provide shared tied variable(s), using BerkeleyDB, across a fork call. I would like to make this as seamless as possible in using it. However, the only way that I can see to make it seamless at this time is to override the perl fork function, a la forks::BerkeleyDB
The thought of overriding a core function is, as it should, raising the hair on the back of my neck and making me nervous.
My thought is to do something like:
sub _fork { ### safely sync/close databases, close environment ### _untie_shared_vars(); _close_BerkeleyDB_env(); ### do the fork ### my $pid = CORE::fork; if (!defined $pid || $pid) { #in parent ### re-open environment and immediately retie shared variables + ### _open_BerkeleyDB_env(); _tie_share_vars(); } elsif ( $pid == 0 ) { # in child ### open environment and immediately tie shared variables ### _open_BerkeleyDB_env(); _tie_share_vars(); } else { croak( "Unable to fork" ); } return $pid; };
My question to you is am I missing a less drastic option or am I worrying too much about the override?
Thanks in advance for your guidance.
lbe
The following is a draft readme that I plan on including with the distribution.
This distribution is a work in progress (started 12/11/11). There will be more to come.
The goal of this distribution is to provide an easy means to share data structures between processes and threads. It does so using objects with convenience methods that are tied to BerkeleyDB hashes or recnos. This functionaly already exists for threads using threads::shared which uses shared memory (RAM). This distribution may be useful in threads when the hash(es) and/or array(s) is too large to be stored in RAM.
Additionally, this distribution provides a queue module, similar to Threads::Queue, that can be used across processes.
The data store of all objects are based upon Berkeley DB Concurrent DataStore (CDS). The module handles all locking needed to insure that only a single writer is allowed at any one time. The selection of CDS was made to favor speed over absolute integrity. This means that if an error occurs while a change is being written to the database, that the database will be left in an uncertain state. Given the overall stability of BerkeleyDB code, this is unlikely, but still possible. If absolute reliability is required, then one should use the BerkeleyDB directly and make use of its Transacational Data Store (TDS) capability.
As stated above, it is the author's intent that this model be used between processes/threads; hence "thread safe" and "fork safe" are goals that must be achieved in order to be successful. Care has been taken to insure that this module achieves this functionality; however,given the lack of precisely clear definitions for either thread or fork safety, it is very possible that the author has not adequately contemplated situations that may cause deadlock or race problems. As such, the author welcomes any feedback, preferably with corrected code, to address and tests, to validate, problems.
I started working on this distribution after trying many if not ever module available on CPAN that supports shared data across processes. I found three things.
So, I decided to try to pull my thoughts together and try to roll something of my own. I have most of the base functionality working having cobbled together portions of code from forks::BerkeleyDB and Threads::Queue and sub-classing BerkeleyDB::Hash and BerkeleyDB::Recno. The last big question that I have is how to make the implementation simple and easy to consume. My testing approach of providing methods that can be explicitly called to close my connections before forking and re-opening in the parent and child afterward is functional. But, I feel like there is, or should be, a better, less intrusive method of implementation, hence the question above.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: To override fork, or not to override fork
by Anonymous Monk on Dec 21, 2011 at 19:07 UTC | |
by learnedbyerror (Monk) on Dec 22, 2011 at 09:10 UTC | |
by learnedbyerror (Monk) on Dec 21, 2011 at 19:25 UTC | |
|
Re: To override fork, or not to override fork
by Tanktalus (Canon) on Dec 24, 2011 at 15:09 UTC |