When I tried your script, I got this error message:
Can't locate object method "decoded_content" via package "HTTP::Header +s"
(update: actually, I tried it with both "decode_content" and "decoded_content" -- both yielded the same sort of error)

But when I ran it with Data::Dumper and dumped the contents of $response, I could see that it had plenty of utf8 data with lots of Chinese characters.

I even upgraded LWP::UserAgent from 2.024 to 2.033 (the current version as of this writing), but got the same error. Did you happen to get that error as well? (It would have been worthwhile to say so.)

If I just use the method "content" (instead of "decoded_content"), I see a lot of page content. Did you try that? Is there some reason why the output of "content" isn't what you really want?

Another update: I forgot to comment on this:

Using content() works but gives (at least in console) garbled data. Doing decode("utf8", $response->content) looks like doubly decoding.

Are you sure you are using a utf8-capable console, with an appropriate unicode font that includes Chinese characters? You might try this little unicode transliterator script -- run the original data through that (without decode('utf8',...)) to see if it really is garbled. (Doesn't look garbled at all in my macosx "Terminal" window -- but I know better than to try pushing through a traditional xterm.)


In reply to Re: Chinese site and decoded_content() trouble by graff
in thread Chinese site and decoded_content() trouble by isync

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.