in reply to Re: stack trace from thread exit
in thread stack trace from thread exit

So, ig, you’re saying that you find core-dumps to be a useful diagnostic technique with regard to Perl programs?   Interesting...   Could you elaborate on (or reference to) how you use them in this context?

Replies are listed 'Best First'.
Re^3: stack trace from thread exit
by ig (Vicar) on Oct 26, 2010 at 00:59 UTC
    Could you elaborate on (or reference to) how you use them in this context?

    I am not sure which context you refer to by this context.

    With regard to Perl programs in general, I have rarely used gdb and, even less often, core dumps: to investigate perl initialization, lexing and parsing, to study and customize the regular expression engine and once to track down a bug in PerlIO that produced a core dump during initialization when attempting to read a UTF-16 encoded file. I don't recall any other use at the moment...

    Much more often, I use logging and test cases to investigate Perl programs and programs in general. I developed my debugging habits about the time Brian Kernighan wrote: The most effective debugging tool is still careful thought, coupled with judiciously placed print statements., and I remain very much in that camp. But there have been times when I have found debuggers and core dumps very helpful. For example, I once found a bug in the perl tokenizer that could not be isolated with test Perl programs and print statements, and tracing the execution was much more effective that merely reading the source in toke.c.

    I haven't done much with XS, but once outside the realm of the Perl interpreter gdb can be a helpful tool.

    When I do use them with Perl programs (gdb and core dumps) I do so as I would for any other program. There are various manuals and tutorials available on the net, including http://www.gnu.org/software/gdb/documentation/ and http://www.cs.cmu.edu/~gilpin/tutorial/. I always build perl from source with -g flag and generally investigate the issue by other means first. While it is possible to attach gdb to live programs, I very much prefer small test cases run in isolation.

    If by this context you mean the situation of the OP, I have not used gdb or core dumps to investigate crashing threaded Perl programs using possibly buggy XS modules. I would try other means to isolate the fault first, but might use gdb if core files were being produced, at least to get some idea where in the code the fault was occuring, if other approaches proved unsatisfactory.

    I was merely responding to the OP who said: I'd like to trap the fault in gdb. Given that specific objective and the warning, I thought a core dump produced in the context of the warning might be what the OP was wanting.

    The test program I provided produces a core dump, if resource limits permit (I am assuming *nix system). It is easily accessed in gdb with:

    gdb /usr/bin/perl core

    (Substitute the path to the relevant instance of perl on your system.)

    The C stack is easily displayed with the bt command. Beyond that, one would have to know perl and the relevant Perl and XS programs in some detail to know where to look.

      I recompiled perl with -Doptimize=-g, but I intentionally did not carry forward my hacks to stub sigaction() calls. With ig's $SIG{__WARN__} function I still get a trace that does not include the root cause.
      Thread 2 (Thread 18154): #0 0x00007f96d1155d57 in kill () from /lib/libc.so.6 #1 0x000000000046a9bb in Perl_my_unexec (my_perl=0x3229c40) at perl.c +:3552 #2 0x00000000004cb36f in Perl_pp_goto (my_perl=0x3229c40) at pp_ctl.c +:2522 #3 0x0000000000445bb0 in Perl_runops_debug (my_perl=0x3229c40) at dum +p.c:1639 #4 0x00000000004680ed in S_call_body (my_perl=0x3229c40, myop=0x7f96c +daa6a20, is_eval=0 '\000') at perl.c:2804 #5 0x0000000000467ae2 in Perl_call_sv (my_perl=0x3229c40, sv=0x336dd3 +0, flags=2) at perl.c:2708 #6 0x00000000004493e5 in S_vdie_common (my_perl=0x3229c40, message=0x3dbdd70 "Thread 1 terminated abnormally: main=HASH(0x3ec +fb90) at /p4/smframework/app/unit_tests/test_fmwk/tests/mnc-ha/Cluste +rSuite.pm line 115.\n", msglen=136, utf8=0, warn=1 '\001') at util.c:1283 #7 0x0000000000449f14 in Perl_vwarn (my_perl=0x3229c40, pat=0x7f96d09 +095a8 "Thread %lu terminated abnormally: %_", args=0x7f96cdaa6c10) at + util.c:1447 #8 0x000000000044a0e5 in Perl_warn (my_perl=0x3229c40, pat=0x7f96d090 +95a8 "Thread %lu terminated abnormally: %_") at util.c:1480 #9 0x00007f96d0901ecb in S_ithread_run (arg=0x32289b0) at threads.xs: +480 #10 0x00007f96d14ab9ca in start_thread () from /lib/libpthread.so.0 #11 0x00007f96d12086fd in clone () from /lib/libc.so.6 #12 0x0000000000000000 in ?? ()
      I think perhaps my hacks to sigaction() were useful and/or there must be other options. As this is not indicating XSUBs code, I'll try 'perl -d' first.

        In general, the simplest way to debug thread subs, is to do so by getting them right outside of threads first.

        Ie. Call them as a standard subroutines until they perform correctly.

        Occasionally, you will encounter a bug that only shows up inside a thread, but thanks to the iThreads model, that is remarkably rare.

        On those rare occasions when this has happened to me--invariably when calling Inline::C or XS code inside the thread--I've usually found that the problem doesn't happen if I run that code in a single thread at a time. Whether that thread be the main thread, or some other. And that invariable means that some dynamic library called from the XS routines I am calling, is inherently not thread safe and cannot be fixed without having the source available.

        The typical scenario for this kind of error, is when the dll/so stores per-client state internally, using the process id as an index into a static data area. Hence, when called from two (or more) threads of the same process, it tries to re-use the same memory for two concurrent clients with inevitable results.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        Interesting that your first stack trace had a frame for XS_Net__SSH2__new and this one doesn't...

        By the time the warning that a thread terminated abnormally is issued, the tread that terminated abnormally is already terminated - it won't show up in the stack trace to the warning.

        Here is another test program that demonstrates how to produce Perl stack trace if a thread dies and in output generally. These might be useful in case you can't duplicate the failure while running under the debugger.

        This also shows that a fatal signal is inconsistent with the failure you are seeing (i.e. your process continues to execute after the thread terminates). This limits the possibilities somewhat.

        use strict; use warnings; use threads; use Carp qw(cluck); $Carp::Verbose = 1; use Inline 'C'; $SIG{__DIE__} = sub { cluck('someone died: ' . $_[0]); }; sub sub_that_dies { cluck('starting sub_that_dies'); sleep(2); die "something bad happened"; } sub start_thread { my ($arg) = @_; cluck('starting start_thread'); print "thread started: $arg\n"; if($arg == 1) { sub_that_dies(); } else { sub_that_core_dumps(); } } my $thr1 = threads->create('start_thread', 1); $thr1->join(); my $thr2 = threads->create('start_thread', 2); $thr2->join(); __END__ __C__ #include <string.h> void sub_that_core_dumps() { printf("starting sub_that_core_dumps\n"); sleep(2); printf("sub_that_core_dumps awoke\n"); Inline_Stack_Vars; Inline_Stack_Push(newSVpvf("About to segfault in sub_that_core +_dumps")); Inline_Stack_Done; perl_call_pv("Carp::cluck",0); memcpy(0,"source",1); Inline_Stack_Void; }

        Output is:

        starting start_thread at ./test2.pl line 20 thread 1 main::start_thread(1) called at ./test2.pl line 30 thread 1 eval {...} called at ./test2.pl line 30 thread 1 thread started: 1 starting sub_that_dies at ./test2.pl line 12 thread 1 main::sub_that_dies() called at ./test2.pl line 22 thread 1 main::start_thread(1) called at ./test2.pl line 30 thread 1 eval {...} called at ./test2.pl line 30 thread 1 someone died: something bad happened at ./test2.pl line 14. at ./test2.pl line 9 thread 1 main::__ANON__('something bad happened at ./test2.pl line 14.\x{a} +') called at ./test2.pl line 14 thread 1 main::sub_that_dies() called at ./test2.pl line 22 thread 1 main::start_thread(1) called at ./test2.pl line 30 thread 1 eval {...} called at ./test2.pl line 30 thread 1 Thread 1 terminated abnormally: something bad happened at ./test2.pl l +ine 14. starting start_thread at ./test2.pl line 20 thread 2 main::start_thread(2) called at ./test2.pl line 33 thread 2 eval {...} called at ./test2.pl line 33 thread 2 thread started: 2 starting sub_that_core_dumps sub_that_core_dumps awoke About to segfault in sub_that_core_dumps at ./test2.pl line 25 thread +2 main::start_thread(2) called at ./test2.pl line 33 thread 2 eval {...} called at ./test2.pl line 33 thread 2 Segmentation fault (core dumped)

        Running gdb on the core file yields the following backtrace:

        #0 sub_that_core_dumps () at test2_pl_945a.xs:15 #1 0xb7745ad2 in XS_main_sub_that_core_dumps (my_perl=0x93ec518, cv=0 +x947f61c) at test2_pl_945a.xs:30 #2 0x08137c50 in Perl_pp_entersub (my_perl=0x93ec518) at pp_hot.c:295 +2 #3 0x081019e9 in Perl_runops_debug (my_perl=0x93ec518) at dump.c:2120 #4 0x08082d5b in Perl_call_sv (my_perl=0x93ec518, sv=0x94f19c4, flags +=10) at perl.c:2611 #5 0xb7416224 in S_ithread_run (arg=0x93a97e8) at threads.xs:509 #6 0xb76b84c0 in start_thread () from /lib/i686/cmov/libpthread.so.0 #7 0xb763784e in clone () from /lib/i686/cmov/libc.so.6
Re^3: stack trace from thread exit
by falan95054 (Novice) on Oct 25, 2010 at 23:36 UTC
    ig was responding to my request. Because I suspect XSUB C functions, I was looking to get a gdb bt; and here it is:
    Thread 2 (Thread 25070): #0 0x0000000000446c34 in Perl_hv_common () No symbol table info available. #1 0x0000000000447fee in Perl_hv_common_key_len () No symbol table info available. #2 0x00000000004d3b3f in Perl_gv_fetchmeth () No symbol table info available. #3 0x00000000004d3d32 in Perl_gv_fetchmeth () No symbol table info available. #4 0x00000000004d5dae in Perl_Gv_AMupdate () No symbol table info available. #5 0x00000000004613cb in Perl_sv_bless () No symbol table info available. #6 0x0000000000461836 in Perl_newSVrv () No symbol table info available. #7 0x0000000000468ff6 in Perl_sv_setref_pv () No symbol table info available. #8 0x00007f6c0548b63e in XS_Net__SSH2__new () from /auto/share/perl/5 +.8.9/lib/site_perl/5.8.9/x86_64-linux-thread-multi/auto/Net/SSH2/SSH2 +.so No symbol table info available. #9 0x00000000004547fb in Perl_pp_entersub () No symbol table info available. #10 0x0000000000452d06 in Perl_runops_standard () No symbol table info available. #11 0x00000000004507ce in Perl_call_sv () No symbol table info available. #12 0x00007f6c05aab899 in S_ithread_run () from /auto/share/perl/5.8.9 +/lib/5.8.9/x86_64-linux-thread-multi/auto/threads/threads.so No symbol table info available. #13 0x00007f6c0664e9ca in start_thread () from /lib/libpthread.so.0 No symbol table info available. #14 0x00007f6c063ab6fd in clone () from /lib/libc.so.6 No symbol table info available. #15 0x0000000000000000 in ?? () No symbol table info available.
    I'm not sure what to make of it yet.

      If you compile perl and your XS code with -g flag, gdb will be able to show you more information, including where and in which file each frame in the stack relates to.

      See Building a debugging perl in the INSTALL file that comes with the perl source for guidance on how to do this for perl.

      This will make it easier to understand, but nothing will make it easy - between perl, XS, SSH and threads, you have quite a complicated situation.