Re: Chinese site and decoded

When I tried your script, I got this error message:

Can't locate object method "decoded_content" via package "HTTP::Header
+s"
[download]

(update: actually, I tried it with both "decode_content" and "decoded_content" -- both yielded the same sort of error)

But when I ran it with Data::Dumper and dumped the contents of $response, I could see that it had plenty of utf8 data with lots of Chinese characters.

I even upgraded LWP::UserAgent from 2.024 to 2.033 (the current version as of this writing), but got the same error. Did you happen to get that error as well? (It would have been worthwhile to say so.)

If I just use the method "content" (instead of "decoded_content"), I see a lot of page content. Did you try that? Is there some reason why the output of "content" isn't what you really want?

Another update: I forgot to comment on this:

Using content() works but gives (at least in console) garbled data. Doing decode("utf8", $response->content) looks like doubly decoding.

Are you sure you are using a utf8-capable console, with an appropriate unicode font that includes Chinese characters? You might try this little unicode transliterator script -- run the original data through that (without decode('utf8',...)) to see if it really is garbled. (Doesn't look garbled at all in my macosx "Terminal" window -- but I know better than to try pushing through a traditional xterm.)

Comment on Re: Chinese site and decoded_content() trouble Select or Download Code

Replies are listed 'Best First'.
Re^2: Chinese site and decoded_content() trouble by varian (Chaplain) on Jun 09, 2007 at 08:57 UTC
Can't locate object method "decoded_content" The method decoded_content is not located in LWP but in HTTP::Message which is accessed indirectly via LWP. The method is available in version 1.57 of this module (see CPAN). Apparently it has been added more recently, my local version 1.42 does not yet provide this method.	[reply]
Re^2: Chinese site and decoded_content() trouble by isync (Hermit) on Jun 11, 2007 at 08:54 UTC
"Is there some reason why the output of "content" isn't what you really want?" --actually yes! I'd like to get clean utf8, which requires to first possibly unzip gzipped content and then decode it properly from a local charset to the more universal utf8 representation (and decoded_content should do this in one simple call).	[reply]