psini has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
I experienced a strange problem this morning on a production server running a web service program based on Net::Server::Prefork.
Disclaimer: I can't reproduce the problem so I can't post the relevant code (the entire program is about 20k lines long), so my question is a much general one: "what could have happened?" or, better, "what could I monitor to have more info next time it happens?".
Now for what happened. This morning one of my customers called saying that the program was down. I connected to his server and found that the deamon was not running; a quick search through syslog gave the following result:
Jun 16 09:50:55 lxinf15 data_server[17854]: 2008/06/16-09:50:55 CONNEC +T TCP Peer: "127.0.0.1:58684" Local: "127.0.0.1:9999" Jun 16 09:50:55 lxinf15 data_server[14582]: Starting "1" children Jun 16 09:50:55 lxinf15 data_server[18578]: Child Preforked (18578) Jun 16 09:50:55 lxinf15 data_server[17854]: Parent process gone away. +Shutting down Jun 16 09:50:55 lxinf15 data_server[17257]: 2008/06/16-09:50:55 CONNEC +T TCP Peer: "127.0.0.1:58685" Local: "127.0.0.1:9999" Jun 16 09:50:55 lxinf15 data_server[17257]: Parent process gone away. +Shutting down Jun 16 09:50:56 lxinf15 data_server[17494]: 2008/06/16-09:50:56 CONNEC +T TCP Peer: "127.0.0.1:58687" Local: "127.0.0.1:9999" Jun 16 09:50:56 lxinf15 data_server[17494]: Parent process gone away. +Shutting down Jun 16 09:50:57 lxinf15 data_server[14048]: 2008/06/16-09:50:57 CONNEC +T TCP Peer: "127.0.0.1:58688" Local: "127.0.0.1:9999" Jun 16 09:50:57 lxinf15 data_server[14048]: Parent process gone away. +Shutting down Jun 16 09:50:57 lxinf15 data_server[1518]: 2008/06/16-09:50:57 CONNECT + TCP Peer: "127.0.0.1:58689" Local: "127.0.0.1:9999" Jun 16 09:50:57 lxinf15 data_server[1518]: Parent process gone away. S +hutting down Jun 16 09:51:01 lxinf15 /USR/SBIN/CRON[18585]: (www-data) CMD (/usr/bi +n/php4-cgi -q /var/systes/Sister/www_sister/pages/rapporti/RapportiBa +tch.php) Jun 16 09:51:04 lxinf15 data_server[18578]: 2008/06/16-09:51:04 CONNEC +T TCP Peer: "127.0.0.1:58691" Local: "127.0.0.1:9999" Jun 16 09:51:04 lxinf15 data_server[18578]: Parent process gone away. +Shutting down Jun 16 09:51:15 lxinf15 data_server[17663]: Parent process gone away. +Shutting down
data_server is my deamon process (yes, my names are always that original); it seems that at 09:50:55 the server received a connection, spawned a child (PID=18578) and then silently died. In the next 20 seconds the children died consequently.
What I don't understand is why the deamon died and why there is no trace in the logs of it's death.
This server has been in production for more than a month, serving several thousand calls every day, and his brother (at another location) has been up three months with a network load at least double. Not to count development and test servers... And I never had such a problem before.
I'm totally baffled, does anybody have a faint idea of what can I try?
Careful with that hash Eugene.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Problem with Net::Server::Prefork - Server died w/o apparent reason
by TGI (Parson) on Jun 16, 2008 at 19:12 UTC | |
by psini (Deacon) on Jun 16, 2008 at 19:22 UTC | |
|
Re: Problem with Net::Server::Prefork - Server died w/o apparent reason
by jethro (Monsignor) on Jun 16, 2008 at 19:39 UTC | |
by psini (Deacon) on Jun 16, 2008 at 19:42 UTC |