GrandFather has asked for the wisdom of the Perl Monks concerning the following question:
As part of a build process I have a script that checks various HTML documents using HTML::Lint. Some of the recent documents use utf-8 and, despite a content="text/html; charset=utf-8" attribute in the HTML head meta tag, HTML::Lint chokes on the utf-8 characters. Is there a work around for this?
Sample code follows:
use strict; use warnings; use utf8; use HTML::Lint; my $lint = HTML::Lint->new (only_types => HTML::Lint::Error::STRUCTURE +); my $html = do {local $/; <DATA>}; $lint->parse ($html); $lint->eof (); my @lintErrsOrg = map {$_->as_string ()} $lint->errors (); print join "\nError Lint org: ", @lintErrsOrg; __DATA__ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w +3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html lang="en"> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>utf8 test</title> </head> <body> <p>ç</p> </body> </html>
Prints:
(8:1) Invalid character \xE7 should be written as ç
Update: ya, hai, that was also posted by me. :)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML::Lint and utf-8 document woes
by graff (Chancellor) on Nov 01, 2006 at 03:57 UTC | |
by GrandFather (Saint) on Nov 01, 2006 at 04:02 UTC | |
by graff (Chancellor) on Nov 01, 2006 at 04:17 UTC | |
by rhesa (Vicar) on Nov 01, 2006 at 04:39 UTC | |
|
Re: HTML::Lint and utf-8 document woes
by rhesa (Vicar) on Nov 01, 2006 at 03:26 UTC | |
by GrandFather (Saint) on Nov 01, 2006 at 03:41 UTC | |
by rhesa (Vicar) on Nov 01, 2006 at 04:05 UTC |