It is all over after the first 3 minutes. Its not a "hang". It's just running really slowly because it cannot get new sockets for connections.
You are creating huge numbers of connections -- there are 6,897 open sockets when your program runs, presumably still timing out from previous runs -- but rather than cleanly shutting those connections down, they are going into TIME_WAIT state and then your server has to wait for one of them to time out (900 seconds or some such) before it can establish a new connection.
MST Elapsed Time Working Set + Established Reset Processor Time Thread Count Ac +tive Passive 18:45:38.609 0 0 0 0 68 +97 6 6062 505 18:45:48.609 5.9687 1 12566528 68 +97 5 6062 505 18:45:58.609 16.0937 15.96875 5 40304640 68 +97 8 6081 505 18:46:08.609 23.4375 25.96875 13 78077952 68 +97 60 6166 505 18:46:18.609 38.9062 35.96875 12 109629440 68 +98 15 6185 505 18:46:28.625 52.1060 45.984375 8 94683136 68 +98 11 6185 505 18:46:38.640 59.5943 56 2 21966848 68 +98 5 6185 505 18:46:48.656 15.6006 66.015625 11 58703872 68 +98 47 6259 505 18:46:58.656 27.5000 76.015625 41 283103232 68 +98 69 6309 506 18:47:08.671 17.0046 86.03125 41 475500544 68 +98 69 6309 506 18:47:18.671 14.8437 96.03125 41 536985600 68 +98 69 6309 506 18:47:28.671 0 106.03125 41 536985600 68 +98 69 6309 506 18:47:38.671 0 116.03125 41 536985600 68 +99 69 6309 506 18:47:48.671 0 126.03125 41 536985600 68 +99 119 6359 506 18:47:58.671 0 136.03125 41 536985600 68 +99 192 6432 506 18:48:08.671 0 146.03125 41 536985600 68 +99 192 6432 506 18:48:18.671 0 156.03125 41 536985600 68 +99 192 6432 506 18:48:28.671 0 166.03125 41 536985600 68 +99 193 6433 506 18:48:38.671 0 176.03125 41 536985600 68 +99 193 6433 506 18:48:48.671 0 186.03125 41 536985600 68 +99 193 6433 506 18:48:58.671 0 196.03125 41 536985600 68 +99 191 6433 508 18:49:08.671 0 206.03125 41 536985600 68 +99 192 6434 508 18:49:18.671 0 216.03125 41 536985600 68 +99 192 6434 508 18:49:28.671 0 226.03125 41 536985600 68 +99 241 6535 508 18:49:38.671 0 236.03125 41 536985600 68 +99 241 6558 508 18:49:48.671 0 246.03125 41 536985600 68 +99 240 6559 509
(If you are going to throw another set of data at us, how about you expend a little energy to make the csv data readable :)
There is either:
(I haven't spotted it yet, but without running and tracing it can be hard to spot);
The debug log -- had it not been empty - might have helped.
Basically, you are in a "dead man's shoes" situation, where your server cannot establish (or probably even accept) a new connection, until one of the existing dying-but-still-to-finally-die connections times out.
Before you do another run, you need to clean up any existing connections. There is the netsh command that allows you to reset at various levels -- winsock; interface; ipv4; tcp etc. -- but perhaps the simplest is to just reboot the machine.
You need to work out what is causing the connections to 'linger'. You appear to be using shutdown correctly -- at the server end at least -- and closing the filehandles; but something is preventing them from being reused immediately, despite your ReuseAddr setting on the listener.
Not a solution, but maybe it will give you some clues about where to start looking and how.
In reply to Re^7: Multithreaded Server Crashes under heavy load
by BrowserUk
in thread Multithreaded Server Crashes under heavy load
by rmahin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |