pangj has asked for the wisdom of the Perl Monks concerning the following question:

Hi,Monks, I have a matter which has troubled me for some days.I wrote a socket server with perl to receive 300 client's coming datas.(Exactly,they're two socket servers running on the same host and binding on different ports,one of them receive 200 clients requests,another receive 100 clients requests.)They ran well before,until this year I move them to another host (Both hosts are RH Linux9 OS,the kernel is 2.4.21),where it run with CLOSE_WAIT problems.When they run,there are so many CLOSE_WAIT appear under the 'netstat -an' command.The numbers of 'CLOSE_WAIT' are more than 2000,then my two programs seem died and don't receive any request.I try some ways,while can't get rid of it.I would copy the codes and paste them here.Could anyone kindly look at it and tell me why this happen and how to resolve it?thanks a lot.
#!/usr/bin/perl use strict; use warnings; use DBI; use IO::Socket qw(:DEFAULT :crlf); use POSIX qw(WNOHANG setsid); use Fcntl qw(:DEFAULT :flock); use MLDBM 'DB_File'; #----------------------------------- # socket parameters use constant PORT => 7789; use constant IPADDR => '192.168.3.231'; use constant PID_FILE => "/home/msfw/msfwsvr.pid"; use constant SOCK_LOCK => "/home/msfw/msfw.lock"; # the flow limit use constant LIMIT => dddddddd; # the child process number use constant PREFORK_CHILD => 5; # the max request numbers which each child can handle use constant MAX_REQUEST => 500; # some global vars my $DONE = 0; # used for signal switch my %CHILDREN; # keep the child id in this hash my $CHILD_COUNT = 0; # the child count # redefine SIG handler $SIG{CHLD}=sub {while((my $child=waitpid(-1,WNOHANG))>0){delete $CHILD +REN{$child};$CHILD_COUNT--}}; $SIG{TERM}=$SIG{INT}=sub {$DONE++}; $SIG{__DIE__}=\&log_die; $SIG{__WARN__}=\&log_warn; # running environment my $rundir='/home/msfw/xyz'; my $domain='xyz'; my $db="$rundir/msfw.db"; # define log # there are 3 logLevel:0,1,2; level 0 is fulllog; set logLevel >=3 to +disable log my $logLevel = 0; my $log="msfwsvr.log"; my $err_log="msfwsvr.err"; #----------------------------------- #----------------------------------- if (-e PID_FILE) { open (HD,PID_FILE) or die "$!"; my $pid=<HD>; close HD; die "process is still run" if kill 0 => $pid; die "can't remove pid file" unless -w PID_FILE && unlink PID_FILE; } if (-e SOCK_LOCK) { die "can't remove socket lock file" unless -w SOCK_LOCK && unlink +SOCK_LOCK; open (HDW,">",SOCK_LOCK) or die "$!"; close HDW; }else{ open (HDW,">",SOCK_LOCK) or die "$!"; close HDW; } open (HDW,">",PID_FILE) or die "$!"; my $pid=daemon(); print HDW $pid; close HDW; my %records; my $listen_socket = IO::Socket::INET->new( LocalAddr => IPADDR, LocalPort => PORT, Listen => 1000, Proto => 'tcp', Reuse => 1, Timeout => 30, ); die "can't create socket: $@" unless defined $listen_socket; while(!$DONE) { make_new_child() while $CHILD_COUNT < PREFORK_CHILD; sleep; } kill_children(); warn "msfwsvr exit normally"; exit 0; #----------------------------------- sub make_new_child { my $child = fork(); die "can't fork $!" unless defined $child; if ($child) { warn "launching child $child"; $CHILDREN{$child} = 1; $CHILD_COUNT++; }else{ handle($listen_socket); exit 0; } } sub lock_it { open (SOCKLOCK,SOCK_LOCK) or die "open lock file fail: $!"; flock (SOCKLOCK,LOCK_EX); } sub unlock_it { flock (SOCKLOCK,LOCK_UN); close SOCKLOCK; } sub lock_tie { open (HD,PID_FILE) or die "open lock file fail: $!"; flock (HD,LOCK_EX); tie(%records, 'MLDBM', $db, O_CREAT|O_RDWR, 0644) or die "Can't open DB_File $db : $!\n"; } sub unlock_tie { untie %records; flock (HD,LOCK_UN); close HD; } sub handle { my $sock = shift; my $cycles = MAX_REQUEST; while ($cycles--) { lock_it(); last unless my $c = $sock->accept; unlock_it(); my $line = <$c>; close $c; do_real_thing($line); } close $sock; } sub do_real_thing { my $line = shift; { local $/=CRLF; chomp $line; } return unless $line =~ /^(.*):(\d+):(get|put)$/; my ($mid,$size,$type) = split(/:/,$line); my $timestamp = time(); lock_tie(); my $entry = $records{$mid}; # RIGHT $entry->{$timestamp} += $size; $records{$mid} = $entry; unlock_tie(); write_log("current socket info:",$$,$timestamp,$mid,$entry->{$time +stamp}, scalar keys %{$entry}) if $logLevel == 0; my $sum; for my $time (keys %{$entry}) { if ($time < $timestamp - 3600) { lock_tie(); my $tmp = $records{$mid}; delete $tmp->{$time}; $records{$mid} = $tmp; unlock_tie(); if ($logLevel == 0 or $logLevel == 1) { my $tmp = $records{$mid}; unless (defined $tmp->{$time}){ write_log("delete old timestamp success:",$$,$mid, +$time); }else{ write_log("can't delete old timestamp:",$$,$mid,$t +ime); } } }else{ $sum += $entry->{$time}; } } if ($sum > LIMIT) { insertDB($mid,$sum); write_log("insert to db:",$$,$mid,$sum) if $logLevel == 0 or $logLevel == 1 or $logLevel == 2; lock_tie(); delete $records{$mid}; unlock_tie(); if ($logLevel == 0 or $logLevel == 1) { unless (defined $records{$mid}) { write_log("clean item success:",$$,$mid); }else{ write_log("can't clean item:",$$,$mid); } } } } sub kill_children { local $SIG{TERM} = $SIG{INT} = 'DEFAULT'; kill TERM => keys %CHILDREN; sleep while %CHILDREN; } sub daemon { my $child = fork(); die "can't fork" unless defined $child; exit 0 if $child; setsid(); open (STDIN, "</dev/null"); open (STDOUT, ">/dev/null"); open (STDERR,">&STDOUT"); chdir $rundir; umask(022); $ENV{PATH}='/bin:/usr/bin:/sbin:/usr/sbin'; return $$; } sub insertDB { # some codes for database here } sub write_log { my $time=scalar localtime; open (HDW,">>",$log); flock (HDW,LOCK_EX); print HDW $time," ",join ' ',@_,"\n"; flock (HDW,LOCK_UN); close HDW; } sub log_die { my $time=scalar localtime; open (HDW,">>",$err_log); print HDW $time," ",@_; close HDW; die @_; } sub log_warn { my $time=scalar localtime; open (HDW,">>",$err_log); print HDW $time," ",@_; close HDW; } #-----------------------------------

Replies are listed 'Best First'.
Re: Why I got so many CLOSE_WAIT
by jonadab (Parson) on Feb 23, 2006 at 12:09 UTC

    Are you reaping your child processes?

    If you don't want to deal with child processes at all when they terminate (i.e., just let them finish and that's that and let them disappear from the process table as soon as they're done), you can set $SIG{CHLD} to 'IGNORE' (if I'm remembering correctly). If you need to do something more elaborate, then that signal entry should be set to a coderef, as usual.

    As far as pasting code into Perlmonks, you should wrap it in <code> tags and, if it's lengthy, also in <readmore> tags, like this...

    Here is some example code below...
    <readmore><code>
    #!/usr/bin/perl # Your code here.
    </code></readmore>

    update: Here's his handler, from the code at the other end of the URI he provided:

    $SIG{CHLD}=sub {while((my $child=waitpid(-1,WNOHANG))>0){delete $CHILDREN{$child};$CHILD_COUNT--}};

    I don't see what's wrong with that (apart from the minor redundancy of keeping a $CHILD_COUNT when it looks like calling keys %CHILDREN in scalar context would probably provide the same information), but I haven't done a lot in the past with reaping child processes, so there could be something I'm overlooking.


    Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.
Re: Why I got so many CLOSE_WAIT
by marto (Cardinal) on Feb 23, 2006 at 12:09 UTC
    Hi pangj,

    Welcome to the Monastery. When posting code you need to have it between <code> </code> tags. Take a look at Writeup Formatting Tips for more information. Also you may want to have a look at the PerlMonks FAQ if you have not done so already.

    Hope this helps.

    Martin
      Thanks for your helps.Now I can paste my codes here. I'm really troubled by the CLOSE_WAIT problems.I'm appreciated for anyone here could kindly help me with this problem.
Re: Why I got so many CLOSE_WAIT
by leighsharpe (Monk) on Feb 23, 2006 at 22:50 UTC
    CLOSE_WAIT is not dealt with by perl at all. It's all done at the OS level. More specifically, it's part of the TCP stack. There's not much you can do about it other than tuning your operating system parameters. I had a similar situation on a Windows box a few years ago. I had large numbers of quite frequent connections from both mysql and a server I had written in perl, much like yours. I also found the CLOSE_WAIT staes giving me anguish. In the end I had to tell Windows to back off the time it held a connection in CLOSE_WAIT state. Down to only a few seconds, I think it was.
Re: Why I got so many CLOSE_WAIT
by GrandFather (Saint) on Feb 23, 2006 at 12:20 UTC

    To follow up on marto's reply, it is generally a bad idea to reference transient material as you have done because your question stays around so others with a similar problem can refer to it. If an important part of the question is no longer available the question and replies are much less useful to people in the future.

    You should put the contents of the file that you link to in code tags inside a readmore block. As marto suggests, see Writeup Formatting Tips for the details of how to do this. Note too that it is conventional (to make your text easier to read) to put a space after full stops and commas.


    DWIM is Perl's answer to Gödel