| [reply] |
HTML::Validator seems like a candidate but the latest update was in 2000. You may also check HTML::Parser. But I think you want HTML::Tidy that is "(X)HTML validation in a Perl object".
Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!
| [reply] |
A bit more information would be helpful.
Under what circumstances are you trying to find ill-formatted HTML tags?
I ask because one case that occurs to me is a situation where you are building or proof-reading raw html on a local machine. If so, using the w3c validator will be easy, free and produce up-to-date results.
Using Perl to use the w3c validator will be more complex (in that case), but it's doable. Write your script to
- read your html file into local memory
- connect to w3c, send the html page (won't work if merely a fragment, IIRC) to the validator (it accepts a URI, file upload or direct input (cut'n'paste), and capture the return (with one of the usual suspects - search the Monastery for 'web scraping' for one set of ideas).
- display the returned errors, warnings, or 'good to go' message for the user or spit'em out to dead trees or whatever.
At a minimum, you can expect standard-based validation this way; Tidy has it's own (configurable within limits) set of notions about valid .html and, as noted above, HTML::Validator may have some outdated notions.
| [reply] |
I recently tried cleaning up following HTML and got problem. The end quote was missing on an attribute value.
use strict;
use warnings;
use HTML::Tidy;
my $html="<a href=\"mailto:test\@test.net>Email Us";
my $tidy = HTML::Tidy->new();
my $clean = $tidy->clean($html);
print "Clean HTML:\n---------------------------\n$clean\n";
This produced incorrect output.
| [reply] |
Unreadable!
Please use <code>...</code> tags ... and please read Markup in the Monastery
Now, what you posted appears (XML view) to be intended to render this way:
I recently tried cleaning up following HTML and got problem. The end quote was missing on an attribute value.
use strict;
use warnings;
use HTML::Tidy;
my $html="<a href=\"mailto:test\@test.net><font size=4>Email Us</a>";
my $tidy = HTML::Tidy->new();
my $clean = $tidy->clean($html);
print "Clean HTML:\n---------------------------\n$clean\n";
This produced incorrect output.
So, what was the incorrect output? Please also read How do I post a question effectively?.
...and on the offchance you don't know what's wrong with your $html, check the quoting, the missing </font> tag, and the H::T docs.
| [reply] [d/l] [select] |