Re: segmentation violations
by BrowserUk (Patriarch) on Sep 11, 2003 at 19:21 UTC
|
One way that might shed some light on the source of the problem would be to profile it using Devel::DProf.
If you set the size of the output buffer (PERL_DPROF_BUFFER) to some thing fairly small, say 4K, then that ought to get you fairly close to where the failure occurs.
If, after failure eventually happens (your program will run very slowly), then examining (the quite probably huge output file (TMON.OUT by default) it should not only tell you where it was and what (roughly) it was doing when the failure occurred, but also the sequence of events that led up to it.
I've used this technique to get me much closer to the source of the failure, and give me hints on where to put tracing and/ or set break points. No guarentees that it won't change the point of failure to a completely different place, but even that can be a clue as to the cause.
HTH.
Ps. I am not kidding when I say your program will run slowly! Don;t sit and wait for it to fail. Start it running once you have you coat on and are leaving for the day (or going to bed or whatever). The output will be waiting for you in the morning:). Sitting and watching and waiting will drive you nuts.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
| [reply] |
|
|
I profiled the code, and found that it was crashing in DB_File::DoTie_. I checked my version (1.72) and then upgraded to the latest on CPAN (1.802).
When I ran the profile again, this time it crashed in one of my DEBUG routines. I ran it again, and it changed to one of my file locking routines. It seems to be jumping all over the place.
So, I jumped to a FreeBSD machine running 5.6.1, checked out the code and re-ran the identical test and it worked without any problems. Besides the 5.6.1, the other machine also has a newer version of Berkeley DB, so I am rebuilding the DB version on my machine to see if that makes a difference.
(Some time passes...) I changed from Berkeley DB 2.4.14 to 2.7.7 (to match one of the production machines), re-ran the script and sure enough, it moved the segmentation violation to yet another place in the code :-(
At this point, I am have to go with the assumption that the problem is caused by Perl 5.6.0
| [reply] |
|
|
(Some time passes...)
Sorry! I did warn you it slowed things down:)
On the basis of what you have posted, I'd have to concur. It may well be that the 'fix' for 5.6.0 would be to downgrade your copy of DB_File rather than unpgrade, assuming the earlier version is still available, but if upgrading your build to 5.6.1 works, that is probably the easiest and safest route. There were lots of things fixed between 5.6.0, and 5.6.1, and your bug could well be another manifestation of one of those fixed bugs.
You might try searching the changes file and bug lists for 5.6.0/5.6.1 and see if anything there looks like it might be your problem, but even if you found it, all you would know is that you had to upgrade to get the fix (or possibly where to obtain a patch so that you could re-build 5.6.0 to correct that one bug), but in the end, if the upgrade does the trick, knowing why it does is just icing.
Good luck.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
| [reply] |
Re: segmentation violations
by Abigail-II (Bishop) on Sep 11, 2003 at 18:35 UTC
|
Well, if you don't know at all where the segmentation violation is happening, you could always try running the
program in a debugger. For instance, with gdb or with valgrind. But that will require some knowledge of perl to
make sense of the output.
There are some known cases of segmentation violation in Perl.
Out of memory, and compilation of syntactically unvalid
regexes are two things I know. But there can be a million
unknown cases. Unless you can isolate the problem to a small
piece of code, there isn't much we can do.
Of course, one of the things you can do is to see whether
you get the segmentation violation with 5.6.1 or 5.8.0. If
not, then the problem was already solved.
Abigail | [reply] |
|
|
Thanks for the reply.
I'll put the code through gdb. I've had limited experience with the perl source, as I did managed to get a Perl 4 version running on VMS many years ago.
I'd forgetton about 'out of memory' as a cause, I will definitely look into that. Its also really helpful to know that regexes can cause problems as well, although in this case I would think the segmentation violation would be 'repeatable'.
Unfortunately, the smaller/shorter versions of the code all work correctly. If I re-run the code when the underlying data is exactly identical the segmentation violation is repeatable, but any type of change to the code or the input causes it to significantly move to earlier or later parts of the code.
The code is portable, I've run it on ActiveState's 5.6.1 for NT, and it worked correctly. That's one of the reasons why I think its compiled code that sits inside of one of the modules, but that could also imply that the problem is with 5.6.0. I use a large number of Perl modules, Storable being one, and an older version (2.x) of Berkeley DB.I haven't had any problems with the shorter smaller versions, its only whe it all comes together and runs for a long period of time.
Thanks.
| [reply] |
|
|
I checked memory and its appears the script is just slightly over 11Mbs when it dies (which for my machine is very small). I am actually surprised to see the usage so low. Similar scripts that I've written for the same problem often exceed 40Mbs.
I can't shorted the code and get the segmentation violation but I've found a set of data (and circumstances) that causes the crash to occur about 10 mins.
| [reply] |