in reply to Failures in TCP/IP stack

So, it looks like bunnyman has the correct source of the problem... leaking file descriptors. Way to catch that one!

So here is where it stands. Here is the current check database sub (connection part only):
sub check_database_health { my $db_host = shift; my %health; debug(4, "Checking connect time for $db_host"); my $dbh; my $connect_time; my $seconds = 2; my $mask = POSIX::SigSet->new( SIGALRM ); # signals to mask in +the handler my $action = POSIX::SigAction->new( sub { die "connect timeout" }, # the handle +r code ref $mask, ); my $oldaction = POSIX::SigAction->new(); sigaction( &POSIX::SIGALRM, $action, $oldaction ); eval { alarm($seconds); my $start_time = time; $dbh = DBI->connect("DBI:mysql:database=test;host=$db_host", ' +check_health', '******', { RaiseError => 1, PrintError => 0 }); or die "Could not connect to $db_host: " . $DBI::errstr; my $end_time = time; $connect_time = $end_time - $start_time; alarm 0; # cancel alarm (if code ran fast) }; sigaction( &POSIX::SIGALRM, $oldaction ); # restore original sign +al handler if ( $@ ) { debug(4, "Problem!: $@"); return undef; } else { $health{connect_time} = $connect_time; debug(4, "OK - $connect_time seconds"); } }
I'm using this to get a connection, and to log the time it takes to connect. If I cannot connect, I need to know that too. So, there are a couple of servers in the list of servers that are not running their databases right now, and every time DBI fails to connect to those machines, it leaves its file descriptor around ( 10 -> socket:16194434 ) forever. I am explicitly closing my connections when I have made one, but in the case of DBI not being able to connect, what do I need to do to ensure that those file descriptors get cleaned up?