Is there a work around?

Yes. Use HTML::Parser or such to tidy your HTML.

Perl is very good at such tasks, and there really is no need to interface a C library to tidy up HTML. On my platform HTML::Tidy doesn't pass the tests due to bugs in libtidy.

qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > make test PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_h +arness(0, 'blib/lib', 'blib/arch')" t/*.t t/00-load..............ok t/cfg-for-parse........ok t/clean-crash..........ok t/extra-quote..........ok t/ignore-text..........ok t/ignore...............ok t/levels...............ok t/message..............ok t/opt-00...............ok t/perfect.............. Failed 1/3 subtests t/pod-coverage.........ok t/pod..................ok t/roundtrip............ok t/segfault-form........ok t/simple...............1/4 Unknown error type: line 2 column 5 - Info: + <body> previously mentioned at t/simple.t line 17 Unknown error type: line 2 column 5 - Info: <body> previously mentione +d at t/simple.t line 17 Unknown error type: line 2 column 5 - Info: <body> previously mentione +d at t/simple.t line 17 t/simple...............ok t/too-many-titles......1/3 Unknown error type: line 4 column 9 - Info: + <head> previously mentioned at t/too-many-titles.t line 22 t/too-many-titles......ok t/unicode.............. Failed 1/7 subtests t/venus................1/3 Unknown error type: line 8 column 2 - Info: + <h1> previously mentioned at t/venus.t line 21 Unknown error type: line 10 column 2 - Info: <h1> previously mentioned + at t/venus.t line 21 Unknown error type: line 11 column 2 - Info: <h1> previously mentioned + at t/venus.t line 21 Unknown error type: line 12 column 2 - Info: <h1> previously mentioned + at t/venus.t line 21 Unknown error type: line 15 column 2 - Info: <h2> previously mentioned + at t/venus.t line 21 Unknown error type: line 17 column 2 - Info: <h4> previously mentioned + at t/venus.t line 21 Unknown error type: line 18 column 2 - Info: <h4> previously mentioned + at t/venus.t line 21 Unknown error type: line 20 column 2 - Info: <h4> previously mentioned + at t/venus.t line 21 Unknown error type: line 25 column 3 - Info: <h4> previously mentioned + at t/venus.t line 21 t/venus................ok t/version..............ok t/wordwrap.............1/2 Unknown error type: line 1 column 1 - Info: + <head> previously mentioned at t/wordwrap.t line 35 t/wordwrap.............ok Test Summary Report ------------------- t/perfect.t (Wstat: 11 Tests: 2 Failed: 0) Parse errors: Bad plan. You planned 3 tests but ran 2. t/unicode.t (Wstat: 11 Tests: 6 Failed: 0) Parse errors: Bad plan. You planned 7 tests but ran 6. Files=20, Tests=78, 2 wallclock secs ( 0.13 usr 0.04 sys + 1.19 cus +r 0.25 csys = 1.61 CPU) Result: FAIL Failed 2/20 test programs. 0/78 subtests failed. make: *** [test_dynamic] Error 255

Nice test output. Let's grab the first reported failure.

qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > PERL_DL_NONLAZY=1 /usr/ +bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', +'blib/arch')" t/perfect.t t/perfect...... Failed 1/3 subtests Test Summary Report ------------------- t/perfect.t (Wstat: 11 Tests: 2 Failed: 0) Parse errors: Bad plan. You planned 3 tests but ran 2. Files=1, Tests=2, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.10 cusr + 0.02 csys = 0.16 CPU) Result: FAIL Failed 1/1 test programs. 0/2 subtests failed. qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 >

Bad plan? 3 tests planned but ran only two? Let's see. Ah, in t/perfect.t I see

use Test::More tests => 3;

and then only two tests

... isa_ok( $tidy, 'HTML::Tidy' ); ... is( scalar @returned, 0, 'Should have no messages' );

Ok, off-by-one. common typo. Let's change the 3 against 2 and run again. Output:

qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > PERL_DL_NONLAZY=1 /usr/ +bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', +'blib/arch')" t/perfect.t t/perfect...... All 2 subtests passed Test Summary Report ------------------- Files=1, Tests=2, 0 wallclock secs ( 0.01 usr 0.01 sys + 0.04 cusr + 0.00 csys = 0.06 CPU) Result: FAIL Failed 1/1 test programs. 0/2 subtests failed. qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 >

Huh? "All 2 subtests passed", yet "Result: FAIL" ? What's going on here? Let's try to run the test script without harness.

qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > perl -Mblib t/perfect.t + "-T" is on the #! line, it must also be used on the command line at t/ +perfect.t line 1.

Ah, ok. I have to pass the -T switch on the command line, let's do that.

qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > perl -Mblib -T t/perfec +t.t Insecure dependency in require while running with -T switch at t/perfe +ct.t line 5. BEGIN failed--compilation aborted at t/perfect.t line 5.

WTF? So ExtUtils::Command::MM turns the "insecure dependencies" into secure ones? I won't dig into that any further, I'm interested in what happens with that dratted test script. I eliminate the -T switch from the shebang line in t/perfect.t - next run:

qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > perl -Mblib t/perfect.t + 1..2 ok 1 - use HTML::Tidy; ok 2 - The object isa HTML::Tidy Segmentation fault

Segfault? Lets fire up the debugger:

qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 > gdb perl GNU gdb Red Hat Linux (6.6-16.fc7rh) Copyright (C) 2006 Free Software Foundation, Inc. ... (no debugging symbols found) Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run -Mblib t/perfect.t Starting program: /usr/bin/perl -Mblib t/perfect.t (no debugging symbols found) ... [Thread debugging using libthread_db enabled] [New Thread -1208416576 (LWP 29388)] (no debugging symbols found) (no debugging symbols found) 1..2 ok 1 - use HTML::Tidy; ok 2 - The object isa HTML::Tidy Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208416576 (LWP 29388)] 0x00147473 in tidyBufFree () from /usr/lib/libtidy-0.99.so.0 (gdb) bt #0 0x00147473 in tidyBufFree () from /usr/lib/libtidy-0.99.so.0 #1 0x001162e0 in XS_HTML__Tidy__tidy_messages (my_perl=0x9d9b008, cv= +0x9efa564) at Tidy.xs:99 #2 0x0208833d in Perl_pp_entersub () from /usr/lib/perl5/5.8.8/i386-l +inux-thread-multi/CORE/libperl.so #3 0x0208179f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i3 +86-linux-thread-multi/CORE/libperl.so #4 0x0202710e in perl_run () from /usr/lib/perl5/5.8.8/i386-linux-thr +ead-multi/CORE/libperl.so #5 0x0804921e in main () (gdb) q The program is running. Exit anyway? (y or n) y qwurx [shmem] ~/rpms/perl/src/HTML-Tidy-1.08 >

So the problem is in libtidy, and I won't debug that. But as an interesting side note - did you notice all the steps necessary to get at that conclusion? Now tell me that the perl testing interface is great and doesn't suck. Expect a rant of mine about that any time soon.

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

In reply to Re: HTML::Tidy crashes with doctype declaration by shmem
in thread HTML::Tidy crashes with doctype declaration by wfsp

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.