in reply to disappearing segfault

By far, the C code that is most likely to contain the bug is the XS code. I'd review the XS code for suspect constructs. But being able to do that with XS code is a fairly rare talent, just one of the reasons why I strongly discourage the writing of XS code. But that at least provides some focus for working on the problem.

Note that you should be able to configure your system such that a core dump is produced and then use that to get a gdb session for poking around at the state of things when the failure happened. But even that can mean quite difficult work ahead.

Switching to a different version, particularly of the XS code but even of Perl or other components, can be particularly helpful in making such problems disappear (perhaps only by once again hiding the bug).

- tye        

Replies are listed 'Best First'.
Re^2: disappearing segfault (XS)
by BrowserUk (Patriarch) on Mar 11, 2009 at 08:01 UTC

    To clarify. The addition of a print statement will not 'fix' the problem, but may defer it. You should be able to use that to help you track down the errant code.

    Starting with some assumptions. If your program normally terminates via an exit statement, and you place a print statement after the exit, it is unlikely--not impossible, but quite unlikely--to change the behaviour.

    So start by placing a single print at the end of the program just before it would normally exit. Does it still segfault in the failing scenario?

    If so, move it up the execution order a little and try again. Often that will allow you to track down the failure to the line in the main code that causes the failure.

    When the print is after that line, the segfault occurs. When it is in front of it, the segfault is deferred, because the presence of the print statement has changed the state of the stack or heap or whatever such that whatever corruption is causing the segfault isn't encountered.

    If the line is a call to a subroutine in another module, move the print statement to the end of that subroutine and move it backward until the segfault goes away again, and so on until you get to a call to an C or XS function.

    At that point you can often get closer to the point of failure by adding trace within the C/XS code. But it is a whole lot easier if you've managed to isolate which C/XS function is the cause.

    No guarantees of course, but it's usually easier than trying to unravel a post-mortem dump.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.