I have to device a script that detects AF_INET sockets that seem broken and cause SIGBUS core dumps. These sockets are connections between a webserver accessible from internet and a database server behind a firewall that services queries from the webserver.
The DBAs responsible for the DBMS suspect that the firewall is the culprit that severs routes that haven't been used for a certain period.
First I need to know how to find out if a socket is a probable candidate to throw a SIGBUS soon. My first shot at this is a mere parsing of the netstat command, especially aiming at sockets in a CLOSE_WAIT state. (see code sample below).
I'd rather do this through the Socket or IO::Socket module but don't know how to read the states of active sockets (like netstat displays them). Maybe someone can give me a hint in that direction?
Then I need an (almost) obvious criterion for killing processes that keep a broken end of the socket open. (how to identify those in the absence of lsof or similar tools?).
I think the whole should have been implemented already in the code of the applications that establish sockets in form of a signal handler that closes sockets properly on receipt of signals such as SIGBUS, SIGPIPE etc. Unfortunately the application that causes this is a black box to me, and I have no access to its code.
Here my first shot to dump socket states in an array through a little Perl script:
#!/opt/perl5/bin/perl use 5.006; use strict; use warnings; my @SOCKET_STATES = qw(ESTABLISHED SYN_SENT SYN_RECV FIN_WAIT1 FIN_WAI +T2 TIME_WAIT CLOSED CLOSE_WAIT LAST_ACK LISTEN CLO +SING UNKNOWN); my %state_counts; my @inet_sockets = parse_netstat(); map $state_counts{$_->{State}}++, @inet_sockets; my @stale_socks = map $_->[0], grep $_->[1] eq 'CLOSE_WAIT', map [$_, $_->{State}], @inet_sockets; my @stale_ips = map $_->[0], sort {$a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] || $a->[3] <=> $b->[3] || $a->[4] <=> $b->[4]} map [$_, split(/\./, $_->{Foreign_IP})], @stale_socks; print "Summary of socket states for type AF_INET:\n\n"; my $sum = 0; print map { $sum += $state_counts{$_}; sprintf "%12s = %4u\n", $_, $state_counts{$_} } sort keys %state_counts; printf "%s\n%12s = %4u\n\n", '-'x19, 'TOTAL', $sum; print "The $state_counts{CLOSE_WAIT} foreign addresses of sockets of s +tate CLOSE _WAIT:\n\n"; print map {sprintf "%30s\n", $_->{Foreign_IP}.':'. $_->{Foreign_Port}} + @stale_ip s; sub parse_netstat { my %CMD = ( exe => '/usr/bin/netstat', args => [qw(-a -n -f inet)], dump_keys => [qw(Protocol Recv-Q Send-Q Local_IP Foreign_I +P State Local_Port Foreign_Port)], ); local *NETSTAT; my @dump; my $pid = open NETSTAT, '-|'; die "cannot fork '$CMD{exe} @{$CMD{args}}': $!\n" unless defined $pid; if ($pid) { my %rec = (); while (<NETSTAT>) { s/^\s+|\s+$//g; @rec{@{$CMD{dump_keys}}[0..5]} = split; next unless $rec{Protocol} eq 'tcp'; @rec{@{$CMD{dump_keys}}[3,-2]} = $rec{$CMD{dump_keys}[3]} =~ /(\d+\.\d+\.\d+\.\d+|\*)\.(\d+|\*)/o; @rec{@{$CMD{dump_keys}}[4,-1]} = $rec{$CMD{dump_keys}[4]} =~ /(\d+\.\d+\.\d+\.\d+|\*)\.(\d+|\*)/o; push @dump, {%rec}; } } else { exec $CMD{exe}, @{$CMD{args}}; die "premature demise of child $pid\n"; } close NETSTAT or die "cannot close pipe from '$CMD{exe} @{$CMD{args}}' +: $!\n"; return @dump; }
This dumps something like such (the list of foreign IPs here discarded):
$ perl socklst.pl|head -14 Summary of socket states for type AF_INET: CLOSED = 1 CLOSE_WAIT = 40 ESTABLISHED = 28 FIN_WAIT_1 = 1 FIN_WAIT_2 = 323 LISTEN = 116 TIME_WAIT = 41 ------------------- TOTAL = 550 The 40 foreign addresses of sockets of state CLOSE_WAIT:
Wouldn't you agree that there are far too many FIN_WAIT2 state sockets? At least to me (though I don't have a networker's background) this looks screwed up.
TIAIn reply to Detecting and reaping stale sockets by SIGSEGV
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |