shmem:
The modules which I am using currently are listed below.
HTML::TreeBuilder;
HTML::LinkExtractor;
HTML::Normalize;
HTML::TagParser;
Text::DoubleMetaphone qw( double_metaphone );
Encode;
HTML::Tidy;
Time::HiRes::Value;
Lingua::EN::Fathom;
Lingua::EN::Summarize;
WWW::Mechanize;
Math::BigFloat;
| [reply] |
Nice list. And how are you sure HTML::TagParser is to blame? What OS, OS version, perl version etc. are you using? You make it darn difficult to help you, giving information piecemeal...
Try e.g. doing without HTML::Tidy. That module of your list at least has C bindings, against libtidy.
On my system, HTML-Tidy-1.08 won't build with libtidy-0.99 - it segfaults in one of the tests ;-)
Locate its shared object (Tidy.so), it must be somewhere in perl's search path (@INC). Run ldd Tidy.so and compare the library versions from the output with the actual library versions on your system - they must match exactly.
If HTML::Tidy isn't the culprit, examine the other modules you use likewise.
update: Or better, run your perl with gdb. Then you can produce a backtrace and see directly where the problem is. Sample session:
qwurx [shmem] ~/HTML-Tidy-1.08 > gdb perl
+
GNU gdb Red Hat Linux (6.6-16.fc7rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and y
+ou are
welcome to change it and/or distribute copies of it under certain cond
+itions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for det
+ails.
This GDB was configured as "i386-redhat-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) run -T -I blib/lib -I blib/arch t/perfect.t
Starting program: /usr/bin/perl -I blib/lib -I blib/arch t/perfect.t
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1209018688 (LWP 9077)]
(no debugging symbols found)
1..3
ok 1 - use HTML::Tidy;
ok 2 - The object isa HTML::Tidy
# running tidy->parse
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1209018688 (LWP 9077)]
0x00144473 in tidyBufFree () from /usr/lib/libtidy-0.99.so.0
(gdb) bt
#0 0x00144473 in tidyBufFree () from /usr/lib/libtidy-0.99.so.0
#1 0x001132e0 in XS_HTML__Tidy__tidy_messages (my_perl=0x9c2b008, cv=
+0x9d5881c) at Tidy.xs:99
#2 0x041c343d in Perl_pp_entersub () from /usr/lib/perl5/5.8.8/i386-l
+inux-thread-multi/CORE/libperl.so
#3 0x041bc89f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i3
+86-linux-thread-multi/CORE/libperl.so
#4 0x0416210e in perl_run () from /usr/lib/perl5/5.8.8/i386-linux-thr
+ead-multi/CORE/libperl.so
#5 0x0804921e in main ()
(gdb)
As you see, the segfault was in tidyBufFree () from /usr/lib/libtidy-0.99.so.0 (#0). Doing likewise with your script should reveal what's going wrong. | [reply] [d/l] [select] |
Thank you shmem, for your valuable suggestion.
1) When I tried debugging(used print statements) in my script it got terminated with Segmentaion Fault after the HTML::TagParser instance is called . So, in HTML::TagParser module, I got the segmentaion fault in its parse sub routine.
2) The output of using gdb debug is given below.
(gdb) run burst_spider.pl -s "http://technologyreview.com/aggregates.a
+spx?p=1" -toolbar
Starting program: /usr/bin/perl burst_spider.pl -s "http://technologyr
+eview.com/aggregates.aspx?p=1" -toolbar
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1212090688 (LWP 12793)]
(no debugging symbols found)
Program received signal SIGSEGV, Segmentation fault.
Switching to Thread -1212090688 (LWP 12793)
0xb7ecf9de in Perl_regclass_swash () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so | [reply] [d/l] |