hubb0r has asked for the wisdom of the Perl Monks concerning the following question:
Hello thelonious monks!
I have an issue that I'm having a terrible time figuring out. I have a small server app that listens on a socket and feeds a list of server names to a client that connects (this is internal to my network). The application retrieves it's list of servers to serve from another perl script that does some health testing on the allowable machines and returns an ordered list to this server app.
Ok, so it is all working really well, except for one little thing. The program that does the checking causes a little load on the servers that it checks, and this "server server" can get a pretty good load up on it. I decided that it was necessary to make the "server server" cache the content that it is serving for up to 10 seconds, and to refresh it's list if it gets a request where the data is more than 10 seconds stale.
So, this works for some of the listening servers, but not all. The "last_update_time" is stored at a global level for the server (although I've tried many ways of setting it, including passing a reference to it) and there is one or more servers that do not update the time no matter what I've tried. There is extensive logging going on, and it is very difficult to tell which is the parent and which is the child and which is not handling the requests as the pid changes with each request (I don't fully understand the module implementation, but it appears to be very apache-like in nature).
Either way, here is the server code:
Here is the config file for the Net::Server::PreForkSimple:#!/usr/local/bin/perl -w package speak_db_names; use Net::Server::PreForkSimple; @ISA = qw{Net::Server::PreForkSimple}; use strict; use Time::Format qw{%time}; use Data::Dumper; use File::Basename; my $basedir = dirname($0); my $refresh_time = 10; my $last_server_update; my $servers = [ ]; $servers = load_data($servers, undef); speak_db_names->run(conf_file => 'speak_db_names.conf'); ## sub process_request {{{ sub process_request { my $self = shift; $servers = load_data($servers, $self); eval { local $SIG{ALRM} = sub { die "Timed Out" }; my $timeout = 2; my $previous_alarm = alarm($timeout); print join( ',', @$servers); alarm($timeout); } } ## }}} ## sub load_data {{{ sub load_data { my $servers = shift; my $self = shift; unless ($last_server_update) { $last_server_update = 0 } if ( (time - $last_server_update) > $refresh_time ) { Net::Server::write_to_log_hook($self, 3, 'DB Cache expired: ' . (time - $last_server_ +update) . ' Seconds Old... Relo +ading'); Net::Server::write_to_log_hook($self, 3, 'Now: ' . time . " Last Upd: $last_ser +ver_update"); my $ret = `$basedir/check_database_health.pl .1 2>>/var/log/ch +eck_db_names.log`; $last_server_update = time; @$servers = split(/\t/, $ret); return $servers; } else { Net::Server::write_to_log_hook($self, 3, 'Reusing DB Cache ' . (time - $last_server_ +update) . ' Seconds old'); return $servers; } } ## }}} 1;
and here is a snippet of log file showing the issue:min_spare_servers 1 min_servers 2 max_spare_servers 2 max_servers 3 max_requests 1000 check_for_dead 120 log_file /var/log/speak_db_names.log log_level 3 pid_file /tmp/speak_db_names.pid background 1 port 3307 child_communication 1
Starting "1" children Starting "1" children 2005/11/28-19:43:35 CONNECT TCP Peer: "172.17.10.57:36685" Local: "172 +.17.10.75:3307" DB Cache expired: 474444 Seconds Old... Reloading Now: 1133225015 Last Upd: 1132750571 2005/11/28-19:43:36 CONNECT TCP Peer: "172.17.10.57:36688" Local: "172 +.17.10.75:3307" DB Cache expired: 474445 Seconds Old... Reloading Now: 1133225016 Last Upd: 1132750571 2005/11/28-19:43:36 CONNECT TCP Peer: "172.17.10.57:36689" Local: "172 +.17.10.75:3307" DB Cache expired: 474445 Seconds Old... Reloading Now: 1133225016 Last Upd: 1132750571 2005/11/28-19:43:36 CONNECT TCP Peer: "172.17.10.57:36690" Local: "172 +.17.10.75:3307" Reusing DB Cache 1 Seconds old 2005/11/28-19:43:36 CONNECT TCP Peer: "172.17.10.57:36691" Local: "172 +.17.10.75:3307" Reusing DB Cache 0 Seconds old 2005/11/28-19:43:37 CONNECT TCP Peer: "172.17.10.57:36692" Local: "172 +.17.10.75:3307" Reusing DB Cache 1 Seconds old 2005/11/28-19:43:37 CONNECT TCP Peer: "172.17.10.57:36693" Local: "172 +.17.10.75:3307" Reusing DB Cache 1 Seconds old 2005/11/28-19:43:38 CONNECT TCP Peer: "172.17.10.57:36694" Local: "172 +.17.10.75:3307" Reusing DB Cache 2 Seconds old Starting "1" children Starting "1" children 2005/11/28-19:43:38 CONNECT TCP Peer: "172.17.10.57:36695" Local: "172 +.17.10.75:3307" Reusing DB Cache 2 Seconds old 2005/11/28-19:43:39 CONNECT TCP Peer: "172.17.10.57:36696" Local: "172 +.17.10.75:3307" Reusing DB Cache 3 Seconds old 2005/11/28-19:43:39 CONNECT TCP Peer: "172.17.10.57:36697" Local: "172 +.17.10.75:3307" Reusing DB Cache 3 Seconds old 2005/11/28-19:43:40 CONNECT TCP Peer: "172.17.10.57:36698" Local: "172 +.17.10.75:3307" Reusing DB Cache 4 Seconds old 2005/11/28-19:43:40 CONNECT TCP Peer: "172.17.10.57:36699" Local: "172 +.17.10.75:3307" Reusing DB Cache 4 Seconds old 2005/11/28-19:43:40 CONNECT TCP Peer: "172.17.10.57:36700" Local: "172 +.17.10.75:3307" DB Cache expired: 474449 Seconds Old... Reloading Now: 1133225020 Last Upd: 1132750571 2005/11/28-19:43:41 CONNECT TCP Peer: "172.17.10.57:36701" Local: "172 +.17.10.75:3307" DB Cache expired: 474450 Seconds Old... Reloading Now: 1133225021 Last Upd: 1132750571
As you can see, the instances that do not update the time have never updated it from the initial load of the daemon. The others update it all the time.
Please help! It seems that more often than not the server that does not update the time gets hit, which causes both a strain on the machines being checked as well as a longer connection time waiting for the check to occur.
Thanks so much for your help!
-Hubb0r
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Net::Server::PreForkSimple and Children
by salvix (Pilgrim) on Nov 30, 2005 at 16:42 UTC | |
Re: Net::Server::PreForkSimple and Children
by Rhandom (Curate) on Nov 30, 2005 at 17:19 UTC | |
by Rhandom (Curate) on Nov 30, 2005 at 17:49 UTC | |
by hubb0r (Pilgrim) on Nov 30, 2005 at 20:38 UTC |