As part of a build process I have a script that checks various HTML documents using HTML::Lint. Some of the recent documents use utf-8 and, despite a content="text/html; charset=utf-8" attribute in the HTML head meta tag, HTML::Lint chokes on the utf-8 characters. Is there a work around for this?
Sample code follows:
use strict; use warnings; use utf8; use HTML::Lint; my $lint = HTML::Lint->new (only_types => HTML::Lint::Error::STRUCTURE +); my $html = do {local $/; <DATA>}; $lint->parse ($html); $lint->eof (); my @lintErrsOrg = map {$_->as_string ()} $lint->errors (); print join "\nError Lint org: ", @lintErrsOrg; __DATA__ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w +3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html lang="en"> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>utf8 test</title> </head> <body> <p>ç</p> </body> </html>
Prints:
(8:1) Invalid character \xE7 should be written as ç
Update: ya, hai, that was also posted by me. :)
In reply to HTML::Lint and utf-8 document woes by GrandFather
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |