Is there a work around?
Yes. Use HTML::Parser or such to tidy your HTML.
Perl is very good at such tasks, and there really is no need to interface a C library to tidy up HTML. On my platform HTML::Tidy doesn't pass the tests due to bugs in libtidy.
qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > make test
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_h
+arness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00-load..............ok
t/cfg-for-parse........ok
t/clean-crash..........ok
t/extra-quote..........ok
t/ignore-text..........ok
t/ignore...............ok
t/levels...............ok
t/message..............ok
t/opt-00...............ok
t/perfect.............. Failed 1/3 subtests
t/pod-coverage.........ok
t/pod..................ok
t/roundtrip............ok
t/segfault-form........ok
t/simple...............1/4 Unknown error type: line 2 column 5 - Info:
+ <body> previously mentioned at t/simple.t line 17
Unknown error type: line 2 column 5 - Info: <body> previously mentione
+d at t/simple.t line 17
Unknown error type: line 2 column 5 - Info: <body> previously mentione
+d at t/simple.t line 17
t/simple...............ok
t/too-many-titles......1/3 Unknown error type: line 4 column 9 - Info:
+ <head> previously mentioned at t/too-many-titles.t line 22
t/too-many-titles......ok
t/unicode.............. Failed 1/7 subtests
t/venus................1/3 Unknown error type: line 8 column 2 - Info:
+ <h1> previously mentioned at t/venus.t line 21
Unknown error type: line 10 column 2 - Info: <h1> previously mentioned
+ at t/venus.t line 21
Unknown error type: line 11 column 2 - Info: <h1> previously mentioned
+ at t/venus.t line 21
Unknown error type: line 12 column 2 - Info: <h1> previously mentioned
+ at t/venus.t line 21
Unknown error type: line 15 column 2 - Info: <h2> previously mentioned
+ at t/venus.t line 21
Unknown error type: line 17 column 2 - Info: <h4> previously mentioned
+ at t/venus.t line 21
Unknown error type: line 18 column 2 - Info: <h4> previously mentioned
+ at t/venus.t line 21
Unknown error type: line 20 column 2 - Info: <h4> previously mentioned
+ at t/venus.t line 21
Unknown error type: line 25 column 3 - Info: <h4> previously mentioned
+ at t/venus.t line 21
t/venus................ok
t/version..............ok
t/wordwrap.............1/2 Unknown error type: line 1 column 1 - Info:
+ <head> previously mentioned at t/wordwrap.t line 35
t/wordwrap.............ok
Test Summary Report
-------------------
t/perfect.t (Wstat: 11 Tests: 2 Failed: 0)
Parse errors: Bad plan. You planned 3 tests but ran 2.
t/unicode.t (Wstat: 11 Tests: 6 Failed: 0)
Parse errors: Bad plan. You planned 7 tests but ran 6.
Files=20, Tests=78, 2 wallclock secs ( 0.13 usr 0.04 sys + 1.19 cus
+r 0.25 csys = 1.61 CPU)
Result: FAIL
Failed 2/20 test programs. 0/78 subtests failed.
make: *** [test_dynamic] Error 255
Nice test output. Let's grab the first reported failure.
qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > PERL_DL_NONLAZY=1 /usr/
+bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib',
+'blib/arch')" t/perfect.t
t/perfect...... Failed 1/3 subtests
Test Summary Report
-------------------
t/perfect.t (Wstat: 11 Tests: 2 Failed: 0)
Parse errors: Bad plan. You planned 3 tests but ran 2.
Files=1, Tests=2, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.10 cusr
+ 0.02 csys = 0.16 CPU)
Result: FAIL
Failed 1/1 test programs. 0/2 subtests failed.
qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 >
Bad plan? 3 tests planned but ran only two? Let's see. Ah, in t/perfect.t I see
use Test::More tests => 3;
and then only two tests
...
isa_ok( $tidy, 'HTML::Tidy' );
...
is( scalar @returned, 0, 'Should have no messages' );
Ok, off-by-one. common typo. Let's change the 3 against 2 and run again. Output:
qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > PERL_DL_NONLAZY=1 /usr/
+bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib',
+'blib/arch')" t/perfect.t
t/perfect...... All 2 subtests passed
Test Summary Report
-------------------
Files=1, Tests=2, 0 wallclock secs ( 0.01 usr 0.01 sys + 0.04 cusr
+ 0.00 csys = 0.06 CPU)
Result: FAIL
Failed 1/1 test programs. 0/2 subtests failed.
qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 >
Huh? "All 2 subtests passed", yet "Result: FAIL" ? What's going on here? Let's try to run the test script without harness.
qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > perl -Mblib t/perfect.t
+
"-T" is on the #! line, it must also be used on the command line at t/
+perfect.t line 1.
Ah, ok. I have to pass the -T switch on the command line, let's do that.
qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > perl -Mblib -T t/perfec
+t.t
Insecure dependency in require while running with -T switch at t/perfe
+ct.t line 5.
BEGIN failed--compilation aborted at t/perfect.t line 5.
WTF? So ExtUtils::Command::MM turns the "insecure dependencies" into secure ones? I won't dig into that any further, I'm interested in what happens with that dratted test script. I eliminate the -T switch from the shebang line in t/perfect.t - next run:
qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > perl -Mblib t/perfect.t
+
1..2
ok 1 - use HTML::Tidy;
ok 2 - The object isa HTML::Tidy
Segmentation fault
Segfault? Lets fire up the debugger:
qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > gdb perl
GNU gdb Red Hat Linux (6.6-16.fc7rh)
Copyright (C) 2006 Free Software Foundation, Inc.
...
(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) run -Mblib t/perfect.t
Starting program: /usr/bin/perl -Mblib t/perfect.t
(no debugging symbols found)
...
[Thread debugging using libthread_db enabled]
[New Thread -1208416576 (LWP 29388)]
(no debugging symbols found)
(no debugging symbols found)
1..2
ok 1 - use HTML::Tidy;
ok 2 - The object isa HTML::Tidy
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208416576 (LWP 29388)]
0x00147473 in tidyBufFree () from /usr/lib/libtidy-0.99.so.0
(gdb) bt
#0 0x00147473 in tidyBufFree () from /usr/lib/libtidy-0.99.so.0
#1 0x001162e0 in XS_HTML__Tidy__tidy_messages (my_perl=0x9d9b008, cv=
+0x9efa564) at Tidy.xs:99
#2 0x0208833d in Perl_pp_entersub () from /usr/lib/perl5/5.8.8/i386-l
+inux-thread-multi/CORE/libperl.so
#3 0x0208179f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i3
+86-linux-thread-multi/CORE/libperl.so
#4 0x0202710e in perl_run () from /usr/lib/perl5/5.8.8/i386-linux-thr
+ead-multi/CORE/libperl.so
#5 0x0804921e in main ()
(gdb) q
The program is running. Exit anyway? (y or n) y
qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 >
So the problem is in libtidy, and I won't debug that. But as an interesting side note - did you notice all the steps necessary to get at that conclusion? Now tell me that the perl testing interface is great and doesn't suck. Expect a rant of mine about that any time soon.
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
|