pango has asked for the wisdom of the Perl Monks concerning the following question:
I have a rather complex POE program that runs into a segfault every now and then, debugging it has been rather difficult for me since I cannot reproduce it reliably due to the nature of POE.
For more context, this POE program was running fine in an older version of perl (around v5.10.1) in an older Gentoo Linux distribution. Recently, we were forced to upgrade to a newer environment using Oracle Linux 8 using perl v5.26.3. Granted, its also a newer kernel and core libs.
This particular POE program uses Net::Curl and Net::Curl::Multi to handle many network requests and run tasks around them.
I am able to get a core dump from this segmentation fault (which happens anywhere between 10 minutes and 3 hours after starting the program), and the stacktrace always look like this:
Stack trace of thread 680482: #0 0x00007ffa7d510df8 Perl_csighandler (libperl.so.5.26) #1 0x00007ffa7d239d80 __restore_rt (libpthread.so.0) #2 0x00007ffa7c3b8b41 __poll (libc.so.6) #3 0x00007ffa7d017e79 __res_context_send (libresolv.so.2) #4 0x00007ffa7d0157cf __res_context_query (libresolv.so.2) #5 0x00007ffa7d015e76 __res_context_querydomain (libresolv.so.2) #6 0x00007ffa7d01646d __res_context_search (libresolv.so.2) #7 0x00007ffa70d6715e _nss_dns_gethostbyname4_r (libnss_dns.so.2) #8 0x00007ffa7c3acb9e gaih_inet.constprop.6 (libc.so.6) #9 0x00007ffa7c3ade5b getaddrinfo (libc.so.6) #10 0x00007ffa787fe38b Curl_getaddrinfo_ex (libcurl.so.4) #11 0x00007ffa78809383 getaddrinfo_thread (libcurl.so.4) #12 0x00007ffa78806a3f curl_thread_create_thunk (libcurl.so.4) #13 0x00007ffa7d22f1da start_thread (libpthread.so.0) #14 0x00007ffa7c2bf8d3 __clone (libc.so.6) Stack trace of thread 608560: #0 0x00007ffa7c321139 __malloc_fork_unlock_parent (libc.so.6) #1 0x00007ffa7c38e33d __libc_fork (libc.so.6) #2 0x00007ffa7d583d82 Perl_pp_fork (libperl.so.5.26) #3 0x00007ffa7d528315 Perl_runops_standard (libperl.so.5.26) #4 0x00007ffa7d568941 S_docatch (libperl.so.5.26) #5 0x00007ffa7d528315 Perl_runops_standard (libperl.so.5.26) #6 0x00007ffa7d49ff2d Perl_call_sv (libperl.so.5.26) #7 0x00007ffa78e7914b poe_data_ev_dispatch_due (EPoll.so) #8 0x00007ffa78e78420 lp_loop_run (EPoll.so) #9 0x00007ffa7d5304a9 Perl_pp_entersub (libperl.so.5.26) #10 0x00007ffa7d528315 Perl_runops_standard (libperl.so.5.26) #11 0x00007ffa7d4a810f perl_run (libperl.so.5.26) #12 0x00005635e9200eda main (perl) #13 0x00007ffa7c2c08a5 __libc_start_main (libc.so.6) #14 0x00005635e9200f1e _start (perl)
When trying to debug further in perl code however, I see that this segfault always happens inside a response handler in POE that runs a tar command (basically, after some data is downloaded, we fork using POE::Wheel::Run to extract the archive):
warn("Data has been downloaded"); warn("Running tar"); # Here is where the segfault always happens my $child = POE::Wheel::Run->new( Program => [ '/bin/tar', # tar args omitted ], StdoutEvent => "wheel_stdout", StderrEvent => "wheel_stderr", );
Given where it happens in the perl code I initially assumed it was something to do with Wheel::Run or tar, but changing that workflow does not fix the issue. Looking at the core dump I am thinking its more todo with libcurl, but I am stumped as to what could cause it to segfault like that
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: POE program running into sporadic segfault
by etj (Priest) on Aug 28, 2024 at 19:58 UTC | |
by pango (Initiate) on Aug 29, 2024 at 00:00 UTC | |
Re: POE program running into sporadic segfault
by NERDVANA (Priest) on Aug 29, 2024 at 02:26 UTC | |
by etj (Priest) on Aug 29, 2024 at 17:15 UTC | |
by NERDVANA (Priest) on Aug 29, 2024 at 21:27 UTC | |
Re: POE program running into sporadic segfault
by jeffenstein (Hermit) on Aug 30, 2024 at 06:53 UTC |