Add to that the fault I'm chasing is inetermittant

From experience, using a debugger is very unlikely to allow you to track down intermittent threading faults anyway. The very act of running code under debugger control inevitably changes the dynamics.

If you are exceptionally lucky, the change will make the problem occur more frequently and reliably, but in 20+ years of working with threaded code that happened exactly once. On every other occasion, even semi-reliable bugs would fail to manifest themselves at all under the debugger and reappear as soon as it was taken out of the picture.

In my experience, the first thing to do with intermediate bugs is make them reproducible. That means running the code in a controlled (repeatable) way in a production-like environment until the problem is reliably reproducible.

The next thing to do, is track down what is going on, and where, when the problem occurs. And that always means adding wide-spread, low-granularity, low-overhead logging.

Often the best logging mechanism is a very simple, unformatted output (ex.just the current thread and line number) using the UDP sendto() function to a local port. On the other end of that port you have a program that simply listens to the port and logs the data to disk (preferably one not used by the monitored program; a USB thumb-drive is ideal!), in a tight loop with no attempt at interpreting it.

This logging can be added to the code at (say) the entry and exit points of the major functions, with little or no impact on the performance or dynamics of the code being monitored. Once the error occurs, you can inspect that log to work out where each thread was when it occurred. You can then remove most of the logging and increase the granularity within those functions active when the bug manifests. Re-run and gradually 'zoom in' on the specific circumstances that cause the bug to arise.

It may sound somewhat crude and slow, but with a little practice (and some well-crafted macros if you are using C), it is very effective. Once you know where each thread is (and therefore what it is doing) when the bug occurs, it is usually obvious where the problem lies.

If your code is not proprietary, I'd be willing to take a look. No promises -- I probably couldn't run it here -- but I might spot something.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?


In reply to Re^5: threads->new falling in a heap. by BrowserUk
in thread threads->new falling in a heap. by Steve_BZ

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.