Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

disappearing segfault

by cfreeman (Novice)
on Mar 10, 2009 at 20:25 UTC ( [id://749716]=perlquestion: print w/replies, xml ) Need Help??

cfreeman has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I've been having a bit of a problem with some scripts I am running, and I was wondering if someone could maybe just give me a nudge in the right direction.

I have a set of perl scripts that are used to setup and configure a piece of custom hardware. The perl scripts use an XS module that plugs into some C libraries that communicate with the hardware. The script has been giving me a segfault lately, but only under very specific conditions. If I run the script directly from the console (Linux) or through a putty session it works fine, the hardware gets correctly initialized and everything. If I run the script by sshing in from windows with cygwin ssh, it seg faults. However, debugging this problem has been quite difficult because any change to the file or how it is run seems to make the seg fault go away. The following make the seg fault disappear:
- adding any print or warn statement to the file
- hard-coding any function value in the file
- running perl with the perl debugger (-d option)
- running perl with the -u option to dump a core file
- running perl in gdb

Any suggestions on what this could be or how I could go about debugging something like this would be greatly appreciated.

Replies are listed 'Best First'.
Re: disappearing segfault
by Joost (Canon) on Mar 10, 2009 at 22:45 UTC
    Given your description it might not work, but it probably won't hurt having a go at running your program through valgrind with various permutations of its options.

    Also: these kinds of problems can indicate a buggy version of the perl interpreter. Which perl version on what OS are you using exactly? If you have a vendor-supplied perl version (instead of a self-compiled official release) that might be significant too (and compiling your own could fix your problems). For instance, last summer it was discovered that RedHats version of perl 5.8 had a really nasty (but not fatal) bug in its operator overloading mechanism.

Re: disappearing segfault
by tilly (Archbishop) on Mar 11, 2009 at 01:10 UTC
    I can tell you what the cause of the bug is, but you'll still have a lot of work to solve it.

    The problem is that some piece of C level code is using a pointer that is pointing where it shouldn't. Therefore it is overwriting some other memory. When you try to use that other memory, something that should be impossible (generally a segfault) happens.

    The trouble with this kind of bug is that the place where it shows up has nothing to do with where the bug really is. The bug could be in any C code, anywhere. The second problem is that anything that changes the internal memory layout of your program (eg adding print or warn statements to the file) will change what memory gets overwritten by the stray pointer and it apparently goes away. In fact the second characteristic is an indicator that this is the type of bug you are facing.

    To fix it you either must track down the bug, or you must change the memory layout so that it doesn't seem to be biting you. The latter is easy - add some code until it doesn't seem to be a problem and pray. It is also unreliable. The former can be hard. Really hard. The best solution is a detailed code audit of all C code involved. I would suggest with the C libraries that you load because they are likely the least tested piece. This is sufficiently difficult that people have written tools to automate this process. Valgrind has been mentioned. More specialized tools like Coverity may be more appropriate (if more expensive).

    However you tackle the problem, good luck.

Re: disappearing segfault
by Perlbotics (Archbishop) on Mar 10, 2009 at 21:12 UTC

    Hi, just some ideas ... I understand that your program behaves different when executed from a local shell or when started remotely. I had a similar problem once that was solved by setting the environment variable LANG=C because some localisations didn't work - but this is just a guess.

    You could

    • compare the environments (run set locally and when logged in via ssh)
    • run strace perl yourprogram.pl arguments and see what is executed before segfault occurs
    Not much, but maybe a starter...

Re: disappearing segfault
by Illuminatus (Curate) on Mar 10, 2009 at 21:09 UTC
    Did you try to run strace on it? It is a longshot, since most other alterations change the behavior, but it is worth a try. When the seg fault 'goes away', is the hardware still initialized correctly?
    To me:
    if ($CustomHardware && $ErraticSegFaultBasedOnConditions) { checkTimingConditionsConcerningHardware(); }
Re: disappearing segfault (XS)
by tye (Sage) on Mar 11, 2009 at 03:38 UTC

    By far, the C code that is most likely to contain the bug is the XS code. I'd review the XS code for suspect constructs. But being able to do that with XS code is a fairly rare talent, just one of the reasons why I strongly discourage the writing of XS code. But that at least provides some focus for working on the problem.

    Note that you should be able to configure your system such that a core dump is produced and then use that to get a gdb session for poking around at the state of things when the failure happened. But even that can mean quite difficult work ahead.

    Switching to a different version, particularly of the XS code but even of Perl or other components, can be particularly helpful in making such problems disappear (perhaps only by once again hiding the bug).

    - tye        

      To clarify. The addition of a print statement will not 'fix' the problem, but may defer it. You should be able to use that to help you track down the errant code.

      Starting with some assumptions. If your program normally terminates via an exit statement, and you place a print statement after the exit, it is unlikely--not impossible, but quite unlikely--to change the behaviour.

      So start by placing a single print at the end of the program just before it would normally exit. Does it still segfault in the failing scenario?

      If so, move it up the execution order a little and try again. Often that will allow you to track down the failure to the line in the main code that causes the failure.

      When the print is after that line, the segfault occurs. When it is in front of it, the segfault is deferred, because the presence of the print statement has changed the state of the stack or heap or whatever such that whatever corruption is causing the segfault isn't encountered.

      If the line is a call to a subroutine in another module, move the print statement to the end of that subroutine and move it backward until the segfault goes away again, and so on until you get to a call to an C or XS function.

      At that point you can often get closer to the point of failure by adding trace within the C/XS code. But it is a whole lot easier if you've managed to isolate which C/XS function is the cause.

      No guarantees of course, but it's usually easier than trying to unravel a post-mortem dump.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: disappearing segfault
by cfreeman (Novice) on Mar 11, 2009 at 14:18 UTC
    Hi everyone,

    Thanks so much for all the suggestions. I'm going to give these a try and I'll post back when I have something working.
Re: disappearing segfault
by BrowserUk (Patriarch) on Mar 11, 2009 at 03:59 UTC

      Because it's unreliable.

      Imagine someone shooting a BB gun blindfolded. He keeps hitting you in the butt and it's darn uncomfortable. So you move ten inches to the right. The BB's continue whizzing past you, "thwap, thwap, thwap..." Then the shooter scratches an itch and resumes shooting, but this time he hits you again. Or you lean to the side to pick something up... you get hit again. Or someone walks up to talk to you, and he gets hit. Just because you're not currently getting hit doesn't mean the BB's aren't hitting anything anymore. You're just not the one getting hit.

      Adding the print statement might stop you from getting bit by the bug, but it doesn't mean the bug isn't biting anymore. Some minor change elsewhere in the code could cause the stars to align again and you start getting bit again.

      Just a thought....


      Dave

        The point is that if the print statement truly fixed the problem, then it would be fixed. By definition.

        We both know that is unlikely. The reality is (most likely) that the addition of a print statement defers the symptom. And that realisation of itself is a useful one.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      Because building secret magic into your program is sloppy, and could easily bite him in the rear in time.

      Quick example: he adds the print statement. Someone inherits the code. They do some clean-up, kill the print statement. Boom, code segfaults for no apparent reason.

      It's just a bad idea to paper over potentially (and in this case, ACTUALLY) fatal, hard-to-trace errors instead of fixing them.

      for(split(" ","tsuJ rehtonA lreP rekcaH")){print reverse . " "}print "\b.\n";

        1. Why does it have to be "secret"? This would be a perfect example of when to legitimately use that mercilessly overused construct: a source code comment.
        2. Calm your indignation and try to think laterally. What other purposee might my suggestion serve?

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://749716]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-24 07:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found