liverpole has asked for the wisdom of the Perl Monks concerning the following question:
Greetings fellow monks,
I have written a server program in Perl, which, running under Linux, uses IO::Socket and IO::Select to service remote messages through a socket connection. It seems to work fine most of the time, but every once in a while the program mysteriously terminates without warning. (It's supposed to run in an endless loop.)
Since I was unable to figure out the cause, I wrote a C wrapper which executes the Perl script in a system() call, and then reports its exit status. By doing this I was able to determine that it's exiting due to a segmentation violation (at least on some occasions).
Has anyone else had such experiences with these modules? Or does it sound like an unrelated problem? Any other general ideas for tracking down the cause of this?
@ARGV=split//,"/:L";
map{print substr crypt($_,ord pop),2,3}qw"PerlyouC READPIPE provides"
Re: Any Known Problems with IO::Socket and/or IO::Select?
by zentara (Archbishop) on Feb 05, 2006 at 11:34 UTC
|
I would try to run the script under strace, and see what the output is when it crashes. It sounds like a library problem.
I'm not really a human, but I play one on earth.
flash japh
| [reply] |
|
Good suggestion, zentara, I'll give that a try.
@ARGV=split//,"/:L";
map{print substr crypt($_,ord pop),2,3}qw"PerlyouC READPIPE provides"
| [reply] |
|
This is a long shot, but since you frequently post Tk code, this script wouldn't happen to contain Tk , would it? If so there was a recently fixed problem with random segfaults.
I'm not really a human, but I play one on earth.
flash japh
| [reply] |
|
|
Re: Any Known Problems with IO::Socket and/or IO::Select?
by monarch (Priest) on Feb 04, 2006 at 23:05 UTC
|
I would bet on the unrelated problem.
I have seen, in production, a perl script that regularly polls about 100+ odd routers using call back functions and IO::Socket::INET and IO::Select, with the select module triggering a callback function depending on which socket the response came in.
Update: just to confirm that the perl script in production has been running for 18449 minutes 41 seconds up until today.. | [reply] |
Re: Any Known Problems with IO::Socket and/or IO::Select?
by robin (Chaplain) on Feb 06, 2006 at 12:50 UTC
|
I would try building a debugging perl (i.e. with Configure option -Doptimize=-g), and make sure you can dump core files (check that ulimit -c is large enough, preferably unlimited).
Then run your server under the debugging perl. When it segfaults and dumps core, open up the core file in gdb (gdb -c core) and do a backtrace (bt) to see where it's crashing.
Until you know where the crash is coming from, you're a bit stuck. (Like others here, I'm inclined to doubt that the problem is with IO::Socket or IO::Select, because those modules are both widely used and well-tested, but it's not impossible.) | [reply] [d/l] [select] |
|
That's a good suggestion -- my version of Perl is already debugging enabled, so I'll try that next. If I'm able to find anything, I will certainly report my results here.
@ARGV=split//,"/:L";
map{print substr crypt($_,ord pop),2,3}qw"PerlyouC READPIPE provides"
| [reply] |
Re: Any Known Problems with IO::Socket and/or IO::Select?
by liverpole (Monsignor) on Feb 07, 2006 at 21:07 UTC
|
Okay, here's an update. I don't think I've solved the problem fully, because I'm pretty sure there's still a SIGSEGV occurring (I just haven't seen it today yet). However, I did find a place where I was inadvertently exiting the application, via a separate module in which the Tk code for the GUI existed. What was happening was that I wanted the GUI to go away after notification, but I did:
Tk::exit;
instead of:
$mw->destroy;
which of course killed the calling-application rather than just the GUI.
I'm still trying to track down the SIGSEGV.
@ARGV=split//,"/:L";
map{print substr crypt($_,ord pop),2,3}qw"PerlyouC READPIPE provides"
| [reply] [d/l] [select] |
Re: Any Known Problems with IO::Socket and/or IO::Select?
by liverpole (Monsignor) on Feb 10, 2006 at 13:05 UTC
|
A final update ...
I followed robin's advice, running the script in the Perl debugger. Naturally that caused the script to run flawlessly; it was running for more than a day before I finally stopped it. I switched back to running it under control of the C program:
#include <stdio.h>
int main()
{
int result, sigval, b_core, exitval;
char *red = "\033[101m";
char *cyan = "\033[106m";
char *off = "\033[m";
while (1) {
result = system("perl ./done_server direct");
sigval = result & 0x7f;
b_core = result & 0x80;
exitval = (result >> 8) & 0xff;
if (2 == sigval) {
// User typed ^C
printf("\n%s** Break **%s\n", cyan, off);
exit(1);
}
printf("\n%sERROR -- The done_server exited%s\n", red, off);
printf("%sResult = %d [%08lx]%s\n", red, result, result, of
+f);
printf("%sCoredump = %d%s\n", red, b_core, off);
printf("%sExit val = %d%s\n", red, exitval, off);
printf("Respawning ...\n");
}
}
/*******************
*** End of main ***
*******************/
Unlike before, though, it's never getting the SIGSEGV it was getting earlier, so it never needs to be restarted. Meanwhile, it's happily servicing remote messages without any problems, both when run in debug mode, and now again from the C wrapper. I never did need to apply the patch which zentara referenced.
Given the choice of knowing where a bug is but not being able to fix it, versus having a bug disappear and not knowing why, certainly the latter is preferrable.
But it's perplexing!
@ARGV=split//,"/:L";
map{print substr crypt($_,ord pop),2,3}qw"PerlyouC READPIPE provides"
| [reply] [d/l] |
|
|