in reply to Re^4: threads->new falling in a heap.
in thread threads->new falling in a heap.
Add to that the fault I'm chasing is inetermittant
From experience, using a debugger is very unlikely to allow you to track down intermittent threading faults anyway. The very act of running code under debugger control inevitably changes the dynamics.
If you are exceptionally lucky, the change will make the problem occur more frequently and reliably, but in 20+ years of working with threaded code that happened exactly once. On every other occasion, even semi-reliable bugs would fail to manifest themselves at all under the debugger and reappear as soon as it was taken out of the picture.
In my experience, the first thing to do with intermediate bugs is make them reproducible. That means running the code in a controlled (repeatable) way in a production-like environment until the problem is reliably reproducible.
The next thing to do, is track down what is going on, and where, when the problem occurs. And that always means adding wide-spread, low-granularity, low-overhead logging.
Don't make the mistake of guessing where the problem might be and concentrating there. You're usually wrong!
There is no point in logging huge volumes of unrelated details. It just gives more crap to wade through and more importantly, can change the dynamics. That can cause the problem to move or disappear completely.
For example, printf (the C version) will usually need to allocate some temporary space for the formatting. That can cause your bug to 'move'.
Equally, threadsafe-CRTs often employ internal locking when performing IO (writing to files etc.). Again, that can cause the bug to move or disappear entirely -- until the logging is removed!
Often the best logging mechanism is a very simple, unformatted output (ex.just the current thread and line number) using the UDP sendto() function to a local port. On the other end of that port you have a program that simply listens to the port and logs the data to disk (preferably one not used by the monitored program; a USB thumb-drive is ideal!), in a tight loop with no attempt at interpreting it.
This logging can be added to the code at (say) the entry and exit points of the major functions, with little or no impact on the performance or dynamics of the code being monitored. Once the error occurs, you can inspect that log to work out where each thread was when it occurred. You can then remove most of the logging and increase the granularity within those functions active when the bug manifests. Re-run and gradually 'zoom in' on the specific circumstances that cause the bug to arise.
It may sound somewhat crude and slow, but with a little practice (and some well-crafted macros if you are using C), it is very effective. Once you know where each thread is (and therefore what it is doing) when the bug occurs, it is usually obvious where the problem lies.
If your code is not proprietary, I'd be willing to take a look. No promises -- I probably couldn't run it here -- but I might spot something.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: threads->new falling in a heap.
by Steve_BZ (Chaplain) on May 11, 2012 at 20:09 UTC | |
by BrowserUk (Patriarch) on May 11, 2012 at 21:45 UTC | |
by Anonymous Monk on Jun 12, 2012 at 07:22 UTC | |
by BrowserUk (Patriarch) on Jun 12, 2012 at 11:08 UTC |