Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re^2: HTML::Tidy and mysterious HTML::Tidy::Document

by Cody Pendant (Prior)
on Jan 18, 2005 at 00:19 UTC ( #422853=note: print w/replies, xml ) Need Help??

in reply to Re: HTML::Tidy and mysterious HTML::Tidy::Document
in thread HTML::Tidy and mysterious HTML::Tidy::Document

Hmm, I guess that's one interpretation. It doesn't seem to be doing it though.

I should give some code, shouldn't I? OK, I have a file, which when I do tidy test.html on the command line, gives three warnings and makes 3 changes.

line 5 column 1 - Warning: <style> inserting "type" attribute line 11 column 1 - Warning: trimming empty <p> line 11 column 4 - Warning: trimming empty <p>

But this script does nothing, generating no warnings and reproducing "test.html" exactly the same as before.

#!/usr/bin/perl use strict; use warnings; use diagnostics; use HTML::Tidy; undef $/; open(M,"test.html") || die "$!"; my $html = <M>; my $tidy = new HTML::Tidy; $tidy->clean( "this file", $html ); for my $message ( $tidy->messages ) { print $message->as_string . "\n"; } print $html;

=~y~b-v~a-z~s; print

Replies are listed 'Best First'.
Re^3: HTML::Tidy and mysterious HTML::Tidy::Document
by Aristotle (Chancellor) on Jan 18, 2005 at 07:40 UTC

    Someone needs to ping petdance (with a patch?). The documentation lies. Here's working code:

    #!/usr/bin/perl use strict; use warnings; use HTML::Tidy; my $fname = join ' ', @ARGV; my $html = do { local $/; <> }; # slurp file(s) from commandline my $tidy = HTML::Tidy->new(); $tidy->parse( $fname, $html ); warn $_->as_string, "\n" for $tidy->messages; print $tidy->clean( $html );

    Makeshifts last the longest.

      Thank you. I thought I was going crazy for a while there.

      For the record, just in case it hasn't been made clear, doing

      $tidy->clean( $html );
      Leaves $html untouched.

      Next question, how does one pass command-line options to tidy via the module? Or should I quit while I'm ahead?

      =~y~b-v~a-z~s; print

        Looks like you can't. I skimmed the TidyLib docs for the functions that configure its behaviour, but I don't see any calls to those in the HTML::Tidy sources. That really is a bummer. I was hoping to use the module to write a sane replacement for the annoying standalone tidy binary. It would also have been nice to be able to integrate TidyLib directly into a Perl-enabled Vim.

        Makeshifts last the longest.

Re^3: HTML::Tidy and mysterious HTML::Tidy::Document
by bmann (Priest) on Jan 18, 2005 at 06:43 UTC
    Thanks for posting the code.

    If you want to use the messages method, you need to parse it first, not clean it.

    #!/usr/bin/perl use lib '/home/brian/lib/lib/perl/5.8.4'; use strict; use warnings; use HTML::Tidy; open M, "test.html" or die "$!"; my $html = do { local $/; <M> }; my $tidy = new HTML::Tidy; $tidy->parse( "test", $html ); for my $message ( $tidy->messages ) { print $message->as_string, $/; } __END__ output on a test file: test (1:1) Warning: missing <!DOCTYPE> declaration test (8:9) Warning: missing </form> before <option> test (6:1) Warning: <option> isn't allowed in <body> elements test (6:1) Warning: <input> isn't allowed in <body> elements test (12:33) Warning: inserting implicit <form> test (14:17) Warning: discarding unexpected </option> test (12:33) Warning: <form> lacks "action" attribute

    If you want the cleaned output, it is edited in-place, ie: $tidy->clean( $html ); # $html now contains tidied output

    Update: the clean method returns the clean html, as Aristotle points out below

Re^3: HTML::Tidy and mysterious HTML::Tidy::Document
by ww (Archbishop) on Jan 18, 2005 at 02:22 UTC
    possible partial answer (but in current brain-dead condition cannot find the ref but believe I read this re invocation from the command line): are you sure Tidy is not writing the (allegedly) corrected file with an alternate or additional extension...
    eg "test.html.tidy" or "test.tidy"

    then again, this may be a mere brain-fart or confusion of a document dealing with the executable rather'n the module.

      I can only repeat, there are only six methods in total, and none of them output anything at all in the way of HTML, only zero if everything's OK and a list of errors if not. There's no output() method or the like. The only guess we've got is that clean($string) edits $string, but that seems not to be the case from my testing.

      =~y~b-v~a-z~s; print

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://422853]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2022-05-23 15:33 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (82 votes). Check out past polls.