ruzam has asked for the wisdom of the Perl Monks concerning the following question:

I don't quite know where to begin with this one.

I have a daemon process which has been dying unexpectedly and without any kind of warning. I think I've narrowed it down to an untie which doesn't return (or at the very least takes an abnormally long time to return).

Here are the open and close functions:
sub _open_db { my $self = shift; warn "_open_db $self $self->{NAME}\n"; # sorted index $DB_BTREE->{'flags'} = 0; $DB_BTREE->{'compare'} = sub { my ($key1, $key2) = @_; $key1 <=> $key2; }; if (tie(my %ip_db, 'DB_File', $self->{IP_FILE}, O_RDWR|O_CREAT, 0644, $DB_BTREE)) { # Enable duplicate keys as well $DB_BTREE->{'flags'} = R_DUP; if (my $stamp_tie = tie(my %stamp_db, 'DB_File', $self->{STAMP_FILE}, O_RDWR|O_CREAT, 0644, $DB_BTREE)) { $self->{IP_DB} = \%ip_db; $self->{STAMP_DB} = \%stamp_db; $self->{STAMP_TIE} = $stamp_tie; warn "_open_db OK IP_DB: $self->{IP_DB}" . " STAMP_DB: $self->{STAMP_DB} STAMP_TIE: $self->{STAMP_TIE}\n"; return $self; } else { $Err_Msg = "Can't open stamp db $self->{STAMP_FILE}: $!"; } warn "_open_db failed $Err_Msg\n"; untie %ip_db; warn "_open_db untie complete\n"; } else { $Err_Msg = "Can't open IP db $self->{IP_FILE}: $!" } warn "_open_db failed $Err_Msg\n"; return 0; } sub close_db { my $self = shift; warn "close_db $self\n"; delete $self->{STAMP_TIE}; warn "delete STAMP_TIE complete\n"; untie %{$self->{STAMP_DB}}; warn "untie STAMP_DB complete\n"; delete $self->{STAMP_DB}; warn "delete STAMP_DB complete\n"; untie %{$self->{IP_DB}}; warn "untie IP_DB complete\n"; delete $self->{IP_DB}; warn "delete IP_DB complete\n"; return 1; }
The databases are opened:
_open_db Db_file=HASH(0x838ad20) blacklist _open_db OK IP_DB: HASH(0x83f4960) STAMP_DB: HASH(0x83f429c) STAMP_T +IE: DB_File=SCALAR(0x83670f4) _open_db Db_file=HASH(0x83670e8) notify _open_db OK IP_DB: HASH(0x8367040) STAMP_DB: HASH(0x83658b8) STAMP_T +IE: DB_File=SCALAR(0x841651c) _open_db Db_file=HASH(0x83b4e84) notify_events _open_db OK IP_DB: HASH(0x838adbc) STAMP_DB: HASH(0x83b4e90) STAMP_T +IE: DB_File=SCALAR(0x841663c) _open_db Db_file=HASH(0x842e370) tracking _open_db OK IP_DB: HASH(0x84165dc) STAMP_DB: HASH(0x84165d0) STAMP_T +IE: DB_File=SCALAR(0x842f914) _open_db Db_file=HASH(0x80f53f0) tracking _open_db OK IP_DB: HASH(0x842f8b4) STAMP_DB: HASH(0x842e43c) STAMP_T +IE: DB_File=SCALAR(0x80f5540)
Then some time later, the databases are closed:
close_db Db_file=HASH(0x842e370) delete STAMP_TIE complete untie STAMP_DB complete delete STAMP_DB complete untie IP_DB complete delete IP_DB complete close_db Db_file=HASH(0x80f53f0) delete STAMP_TIE complete
One database is closed, but then the second database never returns from the first untie. The close/open cycle can repeat sucessfully for days on end, or it can zone out on the first try. Sometimes it stops on the first untie, sometimes the second. I haven't been able to reliably reproduce the error, but I've noticed I can 'encourage it' by causing more activity against the databases. The databases aren't unusually large, maybe 200k for the largest, most are less than 50k. I'm at a loss.

Replies are listed 'Best First'.
Re: Problem with untie
by TOD (Friar) on Feb 20, 2007 at 04:24 UTC
    hmm... i'm reflecting your problem, and i can't say i know a solution. but two things come into my mind: first, if you localize hooks for the standard signals, you might get an error message even if perl itself doesn't produce one, e.g.:
    local $SIG{HUP} = $SIG{TRAP} = ... = \&sig_message; ... sub sig_message { my $sig = shift; print STDERR "Got signal $sig\n"; }
    the second thought that i have is that one shouldn't always trust berkeley db's. at least with GDBM_File i encountered severe data losses. if you don't like to rely on a sql-database, and the amount of data is manageable you'll possibly better off with using plain files. if you serialize your data hashes via Storable.pm you'll get clear and reproducable results.
Re: Problem with untie
by perrin (Chancellor) on Feb 20, 2007 at 13:41 UTC
    Does this daemon do some forking with open file handles? It doesn't look like you're using BerkeleyDB's concurrency options, which means you have to lock it if you want to access it from multiple processes. You can try MLDBM::Sync for a quick fix.
      The daemon forks, then opens a socket to communicate with. All access to the databases comes from the same process (clients request access through the socket). The side effect of untie not returning is that the daemon stops responding to the socket and the client code takes to mean a dead process and 'cleans up'. So as far as I know, only one process ever accesses the databases at any given time.

      I'm going to rewrite the code to use flat files and see if that fixes it (good thing I wrapped it up in a package first!). The data being stored is simple text and doesn't need any fancy massaging.
        I still suggest you just swap direct DB_File for MLDBM::Sync. What you have seems like a locking issue, but without looking at the rest of your code, it's very hard to find it. You should keep in mind that not all of the data you write to DB_File will go to disk until you untie it, and that can sometimes confuse people. MLDBM::Sync takes care of all that for you.