in reply to Re: HTTP::Request::Common::POST and UTF-8
in thread HTTP::Request::Common::POST and UTF-8

Won't that escape data twice? Without actually running it, it looks like
"\x{1234}"
would be transformed by uri_escape_utf8 into
"%C8%B4"
which would be transformed by POST into
"%25C8%25B4"
while the right answer would be
"%C8%B4"

What he actually needs is

my $request = POST( "http://localhost/test", Content => [ data => encode("UTF-8", $utf8_data), more_data => "some more data", ] );

The core problem is that the url-encoded format didn't anticipate data using character sets other than US-ASCII. There is a defacto standard, which consists of encoding a string as UTF-8, and escaping the resulting bytes as if they were encoded using US-ASCII. The above converts the string to UTF-8 bytes, which will be subsequently escaped by POST's guts.

Replies are listed 'Best First'.
Re^3: HTTP::Request::Common::POST and UTF-8
by scollyer (Sexton) on Sep 28, 2005 at 18:48 UTC
    >Won't that escape data twice?

    Yup, just discovered that. Your solution appears to work correctly, with the corresponding unescaping being:

    decode("UTF-8", uri_unescape($req_string))
    Thanks for this.

    I think I'll go and hit myself with a stick now. It'll be less pain than doing UTF-8 in Perl ...

    Steve Collyer