in reply to Re^2: Segmentation fault with HTML::TagParser module
in thread Segmentation fault with HTML::TagParser module

Nice list. And how are you sure HTML::TagParser is to blame? What OS, OS version, perl version etc. are you using? You make it darn difficult to help you, giving information piecemeal...

Try e.g. doing without HTML::Tidy. That module of your list at least has C bindings, against libtidy. On my system, HTML-Tidy-1.08 won't build with libtidy-0.99 - it segfaults in one of the tests ;-)

Locate its shared object (Tidy.so), it must be somewhere in perl's search path (@INC). Run ldd Tidy.so and compare the library versions from the output with the actual library versions on your system - they must match exactly.

If HTML::Tidy isn't the culprit, examine the other modules you use likewise.

update: Or better, run your perl with gdb. Then you can produce a backtrace and see directly where the problem is. Sample session:

qwurx [shmem] ~/HTML-Tidy-1.08 > gdb perl + GNU gdb Red Hat Linux (6.6-16.fc7rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and y +ou are welcome to change it and/or distribute copies of it under certain cond +itions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for det +ails. This GDB was configured as "i386-redhat-linux-gnu"... (no debugging symbols found) Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run -T -I blib/lib -I blib/arch t/perfect.t Starting program: /usr/bin/perl -I blib/lib -I blib/arch t/perfect.t (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] [New Thread -1209018688 (LWP 9077)] (no debugging symbols found) 1..3 ok 1 - use HTML::Tidy; ok 2 - The object isa HTML::Tidy # running tidy->parse Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1209018688 (LWP 9077)] 0x00144473 in tidyBufFree () from /usr/lib/libtidy-0.99.so.0 (gdb) bt #0 0x00144473 in tidyBufFree () from /usr/lib/libtidy-0.99.so.0 #1 0x001132e0 in XS_HTML__Tidy__tidy_messages (my_perl=0x9c2b008, cv= +0x9d5881c) at Tidy.xs:99 #2 0x041c343d in Perl_pp_entersub () from /usr/lib/perl5/5.8.8/i386-l +inux-thread-multi/CORE/libperl.so #3 0x041bc89f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i3 +86-linux-thread-multi/CORE/libperl.so #4 0x0416210e in perl_run () from /usr/lib/perl5/5.8.8/i386-linux-thr +ead-multi/CORE/libperl.so #5 0x0804921e in main () (gdb)

As you see, the segfault was in tidyBufFree () from /usr/lib/libtidy-0.99.so.0 (#0). Doing likewise with your script should reveal what's going wrong.

Replies are listed 'Best First'.
Re^4: Segmentation fault with HTML::TagParser module
by raghu (Novice) on Mar 03, 2009 at 17:03 UTC
    Thank you shmem, for your valuable suggestion.
    1) When I tried debugging(used print statements) in my script it got terminated with Segmentaion Fault after the HTML::TagParser instance is called . So, in HTML::TagParser module, I got the segmentaion fault in its parse sub routine.
    2) The output of using gdb debug is given below.
    (gdb) run burst_spider.pl -s "http://technologyreview.com/aggregates.a +spx?p=1" -toolbar Starting program: /usr/bin/perl burst_spider.pl -s "http://technologyr +eview.com/aggregates.aspx?p=1" -toolbar (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] [New Thread -1212090688 (LWP 12793)] (no debugging symbols found)
    Program received signal SIGSEGV, Segmentation fault. Switching to Thread -1212090688 (LWP 12793) 0xb7ecf9de in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

      Produce a backtrace with the gdb 'bt' command. At this point, seeing some code of yours would be helpful to help, too. But not the whole bunch - a minimal version which exhibits the same problem.

        Shmem
        This error seems to be occuring when we Parse the web page. The code from where we are getting the error is mentioned below :
        my $html_local = HTML::TagParser->new($local_content); my $ele = $html_local->getElementsByTagName("title"); # get element + by title tag

        The $local_content is the entire content of the web page which we are crawling.

        The back trace result is provided for your reference
        (gdb) run burst_spider.pl -s "http://technologyreview.com/aggregates.a +spx?p=5" -toolbar Starting program: /usr/bin/perl burst_spider.pl -s "http://technologyr +eview.com/aggregates.aspx?p=5" -toolbar (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] [New Thread -1211443520 (LWP 14136)] (no debugging symbols found) bt Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1211443520 (LWP 14136)] 0xb7f6d9de in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so (gdb) bt #0 0xb7f6d9de in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so #1 0xb7f7120b in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so #2 0xb7f7120b in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so #3 0xb7f7120b in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so #4 0xb7f7120b in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so #5 0xb7f7120b in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so #6 0xb7f7120b in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so #7 0xb7f7120b in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so

        It seems to be never ending.