vitoco has asked for the wisdom of the Perl Monks concerning the following question:
I'm using WWW::Mechanize (v1.54) to log into a site and extract some data from there. At some pages, I'm getting the "Wide character in print" warning message when saving using $mech->save_content().
I noticed that those pages have explicit charset defined twice and do not match:
Content-type: text/html;charset=ISO-8859-1
in the HTTP response header, and
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
inside the html page.
Doing a trace, $mech->content_type() returns "text/html" and $mech->response()->encoding() returns "iso-8859-1".
It seems that WWW::Mechanize has to check not only the content type for the binmode but also the encoding in save_content() method (and probably more methods), or maybe HTTP::Response is not doing it's work?
Or am I missing something?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Explicit charset confuses WWW::Mechanize and/or HTTP::Response
by ikegami (Patriarch) on May 13, 2009 at 22:33 UTC | |
by Anonymous Monk on May 14, 2009 at 04:05 UTC | |
by ikegami (Patriarch) on May 14, 2009 at 14:24 UTC | |
|
Re: Explicit charset confuses WWW::Mechanize and/or HTTP::Response
by vitoco (Hermit) on May 15, 2009 at 15:52 UTC | |
by Polyglot (Chaplain) on May 16, 2009 at 14:49 UTC | |
by vitoco (Hermit) on May 18, 2009 at 22:30 UTC |