cormanaz has asked for the wisdom of the Perl Monks concerning the following question:
The first is when the page doesn't declare its encoding and it is actually utf-8. Since I don't know the encoding I can't name it for Encode's decode method. When I later try to decode entities with HTML::Entities it complains Parsing of undecoded UTF-8 will give garbage when decoding entities at C:/Perl/site/lib/LWP/Protocol.pm line 114. Is there some way to detect the encoding of a response object if it is undeclared?
The second quandry is what to do when a page lists its encoding as, e.g., Windows 125x (where x is some number). I think it's an error to declare that as an encoding because it's really a charset, but that's the web for you. Anyway I still have the problem of what to name as the actual encoding for the benefit of the decode method. Does anyone have experience with this?
TIA....Steve
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Pesky html encoding problems
by shmem (Chancellor) on Apr 29, 2007 at 08:04 UTC |