I had a problem today which boiled down to this coredump:

perl -we 'sub b { my $a; *b = \&c; b() } sub c { die "here" } b()'
.. a problem introduced in perl-5.8.1 and fixed in perl-5.8.4 and later.

It took me a long time to get to the point of determining the problem, so I thought I'd just let y'all know about it. :)

Various factors conspired to make this take an awfully long time to debug: it was initially triggered by a CGI application, and Apache for some reason didn't drop me a core file at any time (I'm not sure why: on my development box it's running as 'hv' - perhaps a permissions problem?); the problem was triggered deep down the call chain during the process of initiating our custom application logger which does nasty things with STDERR and WARN/DIE handlers, so my initial assumption was that STDERR was in a half-initialised state and this was losing me the error message.

I eventually tracked down the actual error message, a Carp::confess of an SQL error, and discovered that changing it to cluck() instead did allow the error message to get printed. However I'd already convinced myself too firmly of the idea that no coredump had occurred, so I spent a long time trying to track down what might cause the difference in behaviour, still concentrating primarily on the state of STDERR. That was compounded by the red herring of some incorrect information in the Carp::longmess call stack, which I eventually discovered was endemic (even in latest development perls) and probably benign.

I had only half-heartedly tried to reproduce the problem from the command-line, since the application logging module is very CGI-specific; eventually I discovered that I had neglected the cookies still being sent by the browser (from a previous testing session) which were causing it to go off and do some user authentication, and as soon as I could properly reproduce the environment for a command-line invocation I was finally able to see that the problem was a SEGV.

So I took a copy of the entire application environment, and started cutting down the code. At the point I reached this method it quickly became clear I'd found the culprit (some names changed):

package NVC::DB::User; sub bless { my $self = shift; my $config = $self->config; # do this only once, on first call no warnings qw/ redefine /; if ($config->extend_user('type 1')) { *bless = \&bless_type1; } elsif ($config->extend_user('type 2')) { *bless = \*bless_type2; } else { *bless = \&bless_simple; } return $self->bless(@_); }

I knew the original confess() was triggered a few levels down from the bless_type2() method, and from this point I could easily start constructing one-liners to reduce the problem to the minimal case above.

Lessons: when you get new information, re-examine your assumptions ("there is no core"); put the extra effort in to make even your CGI-specific modules easy to use from the command-line (I finally knocked up a printenv.cgi to see how the cookies were passed through).

Another possible one: automate the process of making new perl versions application-ready. (The application has a number of module dependencies which I've mostly managed to document now, but installing them all still takes long enough that I tend to keep just one perl version up to date with them all; if it were automated, I'd have been much quicker to try the problem case under the latest maintenance release, and seeing correct behaviour there would have saved me a lot of time in the debugging process.)

Hugo

Replies are listed 'Best First'.
Re: Don't do this at home: SEGV from redefined sub
by hv (Prior) on Aug 05, 2004 at 01:58 UTC

    Dang, forgot to log in before posting this, and now PM won't let me repost as myself for some reason. I seem to be having an unusually stupid day. :(

    Update: now fixed; thanks chromatic.

    Hugo