Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: HTML::Tidy and mysterious HTML::Tidy::Document

by BUU (Prior)
on Jan 17, 2005 at 23:20 UTC ( #422847=note: print w/replies, xml ) Need Help??

in reply to HTML::Tidy and mysterious HTML::Tidy::Document

Just from reading the documentation, it appears that the clean() method does an inplace edit of your string. It this not true?
  • Comment on Re: HTML::Tidy and mysterious HTML::Tidy::Document

Replies are listed 'Best First'.
Re^2: HTML::Tidy and mysterious HTML::Tidy::Document
by Cody Pendant (Prior) on Jan 18, 2005 at 00:19 UTC

    Hmm, I guess that's one interpretation. It doesn't seem to be doing it though.

    I should give some code, shouldn't I? OK, I have a file, which when I do tidy test.html on the command line, gives three warnings and makes 3 changes.

    line 5 column 1 - Warning: <style> inserting "type" attribute line 11 column 1 - Warning: trimming empty <p> line 11 column 4 - Warning: trimming empty <p>

    But this script does nothing, generating no warnings and reproducing "test.html" exactly the same as before.

    #!/usr/bin/perl use strict; use warnings; use diagnostics; use HTML::Tidy; undef $/; open(M,"test.html") || die "$!"; my $html = <M>; my $tidy = new HTML::Tidy; $tidy->clean( "this file", $html ); for my $message ( $tidy->messages ) { print $message->as_string . "\n"; } print $html;

    =~y~b-v~a-z~s; print

      Someone needs to ping petdance (with a patch?). The documentation lies. Here's working code:

      #!/usr/bin/perl use strict; use warnings; use HTML::Tidy; my $fname = join ' ', @ARGV; my $html = do { local $/; <> }; # slurp file(s) from commandline my $tidy = HTML::Tidy->new(); $tidy->parse( $fname, $html ); warn $_->as_string, "\n" for $tidy->messages; print $tidy->clean( $html );

      Makeshifts last the longest.

        Thank you. I thought I was going crazy for a while there.

        For the record, just in case it hasn't been made clear, doing

        $tidy->clean( $html );
        Leaves $html untouched.

        Next question, how does one pass command-line options to tidy via the module? Or should I quit while I'm ahead?

        =~y~b-v~a-z~s; print
      Thanks for posting the code.

      If you want to use the messages method, you need to parse it first, not clean it.

      #!/usr/bin/perl use lib '/home/brian/lib/lib/perl/5.8.4'; use strict; use warnings; use HTML::Tidy; open M, "test.html" or die "$!"; my $html = do { local $/; <M> }; my $tidy = new HTML::Tidy; $tidy->parse( "test", $html ); for my $message ( $tidy->messages ) { print $message->as_string, $/; } __END__ output on a test file: test (1:1) Warning: missing <!DOCTYPE> declaration test (8:9) Warning: missing </form> before <option> test (6:1) Warning: <option> isn't allowed in <body> elements test (6:1) Warning: <input> isn't allowed in <body> elements test (12:33) Warning: inserting implicit <form> test (14:17) Warning: discarding unexpected </option> test (12:33) Warning: <form> lacks "action" attribute

      If you want the cleaned output, it is edited in-place, ie: $tidy->clean( $html ); # $html now contains tidied output

      Update: the clean method returns the clean html, as Aristotle points out below

      possible partial answer (but in current brain-dead condition cannot find the ref but believe I read this re invocation from the command line): are you sure Tidy is not writing the (allegedly) corrected file with an alternate or additional extension...
      eg "test.html.tidy" or "test.tidy"

      then again, this may be a mere brain-fart or confusion of a document dealing with the executable rather'n the module.

        I can only repeat, there are only six methods in total, and none of them output anything at all in the way of HTML, only zero if everything's OK and a list of errors if not. There's no output() method or the like. The only guess we've got is that clean($string) edits $string, but that seems not to be the case from my testing.

        =~y~b-v~a-z~s; print

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://422847]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2022-05-24 12:13 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (82 votes). Check out past polls.