cw010000 has asked for the wisdom of the Perl Monks concerning the following question:

I have been trying for several months to build a screen scraping application using LWP. Everything works just fine except a crucial module, using the LWP POST request to get back information from a form with POST action. Even that works fine with single character unicode, like languages with Western European alphabets. However it fails miserably with multibyte characters, sending back zeroes wherever there are multibyte characters. It thinks it has succeeded, and a count of the zero characters and the intervening spaces, commas, etc. shows that it knows the spacing of the characters, but nothing I have tried nor been advised on the Web to try works. It may be a bug in CPAN's version of LWP::UserAgent, but it might also be some hidden secret of which I...and apparently everyone else on the Web...is unaware of. HELP!

Replies are listed 'Best First'.
Re: LWP: POST request
by moritz (Cardinal) on Mar 11, 2008 at 14:45 UTC
    You have to exctract charset information from the Content-Type header, if any is present, and use Encode::decode to decode with that charset.

    If there is no charset info in the header, you can still look in the HTML for a http-equiv meta tag, or guess the enccoding with Encode::Guess (that should be the last fallback solution since it's unreliable9.

    And be sure to read perluniintro and perlunifaq.

Re: LWP: POST request
by Anonymous Monk on Mar 11, 2008 at 14:29 UTC