disappearing segfault

cfreeman has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: disappearing segfault by Joost (Canon) on Mar 10, 2009 at 22:45 UTC
Given your description it might not work, but it probably won't hurt having a go at running your program through valgrind with various permutations of its options. Also: these kinds of problems can indicate a buggy version of the perl interpreter. Which perl version on what OS are you using exactly? If you have a vendor-supplied perl version (instead of a self-compiled official release) that might be significant too (and compiling your own could fix your problems). For instance, last summer it was discovered that RedHats version of perl 5.8 had a really nasty (but not fatal) bug in its operator overloading mechanism. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re: disappearing segfault by tilly (Archbishop) on Mar 11, 2009 at 01:10 UTC
I can tell you what the cause of the bug is, but you'll still have a lot of work to solve it. The problem is that some piece of C level code is using a pointer that is pointing where it shouldn't. Therefore it is overwriting some other memory. When you try to use that other memory, something that should be impossible (generally a segfault) happens. The trouble with this kind of bug is that the place where it shows up has nothing to do with where the bug really is. The bug could be in any C code, anywhere. The second problem is that anything that changes the internal memory layout of your program (eg adding print or warn statements to the file) will change what memory gets overwritten by the stray pointer and it apparently goes away. In fact the second characteristic is an indicator that this is the type of bug you are facing. To fix it you either must track down the bug, or you must change the memory layout so that it doesn't seem to be biting you. The latter is easy - add some code until it doesn't seem to be a problem and pray. It is also unreliable. The former can be hard. Really hard. The best solution is a detailed code audit of all C code involved. I would suggest with the C libraries that you load because they are likely the least tested piece. This is sufficiently difficult that people have written tools to automate this process. Valgrind has been mentioned. More specialized tools like Coverity may be more appropriate (if more expensive). However you tackle the problem, good luck.	[reply]
Re: disappearing segfault by Perlbotics (Archbishop) on Mar 10, 2009 at 21:12 UTC
Hi, just some ideas ... I understand that your program behaves different when executed from a local shell or when started remotely. I had a similar problem once that was solved by setting the environment variable `LANG=C` because some localisations didn't work - but this is just a guess. You could compare the environments (run `set` locally and when logged in via ssh) run `strace perl yourprogram.pl arguments` and see what is executed before segfault occurs Not much, but maybe a starter...	[reply] [d/l] [select]
Re: disappearing segfault by Illuminatus (Curate) on Mar 10, 2009 at 21:09 UTC
Did you try to run strace on it? It is a longshot, since most other alterations change the behavior, but it is worth a try. When the seg fault 'goes away', is the hardware still initialized correctly? To me: `if ($CustomHardware && $ErraticSegFaultBasedOnConditions) { checkTimingConditionsConcerningHardware(); }` [download]	[reply] [d/l]
Re: disappearing segfault (XS) by tye (Sage) on Mar 11, 2009 at 03:38 UTC
By far, the C code that is most likely to contain the bug is the XS code. I'd review the XS code for suspect constructs. But being able to do that with XS code is a fairly rare talent, just one of the reasons why I strongly discourage the writing of XS code. But that at least provides some focus for working on the problem. Note that you should be able to configure your system such that a core dump is produced and then use that to get a gdb session for poking around at the state of things when the failure happened. But even that can mean quite difficult work ahead. Switching to a different version, particularly of the XS code but even of Perl or other components, can be particularly helpful in making such problems disappear (perhaps only by once again hiding the bug). - tye	[reply]
Re^2: disappearing segfault (XS) by BrowserUk (Patriarch) on Mar 11, 2009 at 08:01 UTC
To clarify. The addition of a print statement will not 'fix' the problem, but may defer it. You should be able to use that to help you track down the errant code. Starting with some assumptions. If your program normally terminates via an exit statement, and you place a print statement after the exit, it is unlikely--not impossible, but quite unlikely--to change the behaviour. So start by placing a single print at the end of the program just before it would normally exit. Does it still segfault in the failing scenario? If so, move it up the execution order a little and try again. Often that will allow you to track down the failure to the line in the main code that causes the failure. When the print is after that line, the segfault occurs. When it is in front of it, the segfault is deferred, because the presence of the print statement has changed the state of the stack or heap or whatever such that whatever corruption is causing the segfault isn't encountered. If the line is a call to a subroutine in another module, move the print statement to the end of that subroutine and move it backward until the segfault goes away again, and so on until you get to a call to an C or XS function. At that point you can often get closer to the point of failure by adding trace within the C/XS code. But it is a whole lot easier if you've managed to isolate which C/XS function is the cause. No guarantees of course, but it's usually easier than trying to unravel a post-mortem dump. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re: disappearing segfault by cfreeman (Novice) on Mar 11, 2009 at 14:18 UTC
Hi everyone, Thanks so much for all the suggestions. I'm going to give these a try and I'll post back when I have something working.	[reply]
Re: disappearing segfault by BrowserUk (Patriarch) on Mar 11, 2009 at 03:59 UTC
If - adding any print or warn statement to the file "make the seg fault disappear", why not just add a harmless print statement? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^2: disappearing segfault by davido (Cardinal) on Mar 11, 2009 at 06:26 UTC
Because it's unreliable. Imagine someone shooting a BB gun blindfolded. He keeps hitting you in the butt and it's darn uncomfortable. So you move ten inches to the right. The BB's continue whizzing past you, "thwap, thwap, thwap..." Then the shooter scratches an itch and resumes shooting, but this time he hits you again. Or you lean to the side to pick something up... you get hit again. Or someone walks up to talk to you, and he gets hit. Just because you're not currently getting hit doesn't mean the BB's aren't hitting anything anymore. You're just not the one getting hit. Adding the print statement might stop you from getting bit by the bug, but it doesn't mean the bug isn't biting anymore. Some minor change elsewhere in the code could cause the stars to align again and you start getting bit again. Just a thought.... Dave	[reply]
Re^3: disappearing segfault by BrowserUk (Patriarch) on Mar 11, 2009 at 06:48 UTC
The point is that if the print statement truly fixed the problem, then it would be fixed. By definition. We both know that is unlikely. The reality is (most likely) that the addition of a print statement defers the symptom. And that realisation of itself is a useful one. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^2: disappearing segfault by pobocks (Chaplain) on Mar 11, 2009 at 05:31 UTC
Because building secret magic into your program is sloppy, and could easily bite him in the rear in time. Quick example: he adds the print statement. Someone inherits the code. They do some clean-up, kill the print statement. Boom, code segfaults for no apparent reason. It's just a bad idea to paper over potentially (and in this case, ACTUALLY) fatal, hard-to-trace errors instead of fixing them. `for(split(" ","tsuJ rehtonA lreP rekcaH")){print reverse . " "}print "\b.\n";`	[reply] [d/l]
Re^3: disappearing segfault by BrowserUk (Patriarch) on Mar 11, 2009 at 06:08 UTC
Why does it have to be "secret"? This would be a perfect example of when to legitimately use that mercilessly overused construct: a source code comment. Calm your indignation and try to think laterally. What other purposee might my suggestion serve? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^4: disappearing segfault by pobocks (Chaplain) on Mar 13, 2009 at 06:31 UTC
Re^5: disappearing segfault by BrowserUk (Patriarch) on Mar 13, 2009 at 06:47 UTC
Some notes below your chosen depth have not been shown here


No such thing as a small change
	PerlMonks