Two possibilities.
The page you are getting is formatted in the same encoding as the one you are generating, but you haven't told the browser which encoding this is.
You'll need to do
print $cgi->header(-type=>'text/html', -charset=>'UTF-8');
The above will result in the addition following META element in your HTML document.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
The page you are getting is formatted using one encoding, and yours is formatted in another.
Say the type of the page your are downloading is 'text/html; charset=UTF-8'.
Say the type of the page your are generating is 'text/html; charset=iso-latin-1'.
You'll need to do
use Encode qw( decode encode ); print $cgi->header(-type=>'text/html', -charset=>'iso-latin-1'); my $utf8_html_from_src = ... my $html_from_src = decode('UTF-8', $utf8_html_from_src); my $html_to_send = process($html_from_src); my $latin_html_to_send = encode('iso-latin-1', $html_to_send); print($latin_html_to_send);
In this example, the encoding used by the source can represent more characters than the encoding used to deliver the content. Some characters may appear as question marks. Doing it so this doesn't happen is harder.
In reply to Re: LWP gives funky characters
by ikegami
in thread LWP gives funky characters
by jhanna
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |