in reply to LWP gives funky characters

Two possibilities.

Replies are listed 'Best First'.
Re^2: LWP gives funky characters
by jhanna (Scribe) on Jan 25, 2007 at 01:05 UTC
    require LWP::UserAgent; my $ua = LWP::UserAgent->new; my $response = $ua->get($ARGV[0]); $_=$response->content;
    then I take the the resulting page and find the DIV of interest and extract the text. That text goes in the database. Later another page selects the text and puts it in a textarea.

    The page is coming in text/html; charset=UTF-8, and my destination page is also text/html; charset=UTF-8.

    I'll look at Encode / decode('UTF-8'... that might be exactly what I need.

    Thanks.

      Then it's the first possibility. You're missing:

      print $cgi->header(-type=>'text/html', -charset=>'UTF-8');

      You probably have

      print $cgi->header(-type=>'text/html');

      which is the same as

      print $cgi->header(-type=>'text/html', -charset=>'ISO-8859-1');

      That tells the browser you are using one character set when you are using another.

        Hmmm... I'm pretty stuck.

        As I'm looking at it I think Perl is probably doing the right thing, but that the UTF8 string is getting double encoded somewhere else. Thanks, though. That's what I get for going perl -> sql -> aspx -> javascript -> dhtml.... I'm hoping I can fix it coming out of aspx or in the javascript.