tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:
HTML::Treebuilder strips quotes around numerical attributes, as can be seen in the example below, which includes the test I want passed.
According to the client I've got, without quotes around all attributes, we have valid html but not xhtml. One workaround is changing the behavior of HTML::Treebuilder, or perhaps the as_html method of HTML::Element. Another would be an html to xhtml converter, if anyone can recommend one.
I appreciate any leads for solving this.
********************* UPDATE: script to do this with tidy, quoting from the how-to linked to above.use strict; use warnings; use HTML::Treebuilder; use HTML::Element; use Test::More qw(no_plan); my $html= '<table> <tr> <td valign="top" width="10"></td> </tr> </table>'; my $treeroot = HTML::TreeBuilder->new; my $html_tree = $treeroot->parse( $html); print $html_tree->as_HTML(); #the quotes around 10 get stripped away. #(the quotes around top are kept.) my $wanted = q{<html><head></head><body><table><tr><td valign="top" wi +dth="10"></td></tr></table></body></html>}; my $got = $html_tree->as_HTML(); is($got, $wanted);
"The spaces are important and so is every consonant. What does all that code mean? First, tidy identifies the program to use. -asxml instructs Tidy to convert the HTML document to XHTML. -m tells the program to modify the document in its current location, and c:\XHTML\tidy.htm is the location of the messy document to be converted."my (@files, @rippers); @files = Filecontrol::get_files($dir); foreach my $file (@files) { `tidy -asxml -m $file`; }
|
|---|