Re: HTTP::Request::Common::POST and UTF-8

I'm not sure if this will fit your needs, but here's one possible solution. I added one new line to your code and modified another, both marked with ### <--.

#!/usr/bin/perl

use strict;
use warnings;

use LWP;
use HTTP::Request::Common;
use Encode;
use charnames qw(greek);

use URI::Escape qw(uri_escape_utf8); ### <-- new line

binmode(STDOUT, ":utf8");

my $utf8_data = "<\N{alpha}\N{beta}\N{gamma}\N{delta}>";

print $utf8_data, "\n\n";

print Encode::is_utf8($utf8_data)
        ? "\$utf8_data marked as UTF-8\n\n"
        : "\$utf8_data not marked as UTF-8\n\n";

my $request = POST("http://localhost/test",
                    Content => [
                                data      => uri_escape_utf8($utf8_dat
+a), ### <-- modified line
                                more_data => "some more data",
                              ]
                  );

my $req_string = $request->as_string();

print Encode::is_utf8($req_string)
        ? "\$req_string marked as UTF-8\n\n"
        : "\$req_string not marked as UTF-8\n\n";

print $req_string, "\n";
[download]

Comment on Re: HTTP::Request::Common::POST and UTF-8 Select or Download Code

Replies are listed 'Best First'.
Re^2: HTTP::Request::Common::POST and UTF-8 by scollyer (Sexton) on Sep 28, 2005 at 15:32 UTC
> I'm not sure if this will fit your needs, but here's one > possible solution. Thanks very much. That looks a lot better. (I needed to upgrade to a new version of URI::Escape though). I guess it might be nice if you could override the default escape routine inside POST, rather than doing it manually. It's also suitable for the real code, too. Steve Collyer	[reply]
Re^2: HTTP::Request::Common::POST and UTF-8 by ikegami (Patriarch) on Sep 28, 2005 at 16:57 UTC
Won't that escape data twice? Without actually running it, it looks like `"\x{1234}"` would be transformed by `uri_escape_utf8` into `"%C8%B4"` which would be transformed by `POST` into `"%25C8%25B4"` while the right answer would be `"%C8%B4"` What he actually needs is `my $request = POST( "http://localhost/test", Content => [ data => encode("UTF-8", $utf8_data), more_data => "some more data", ] );` [download] The core problem is that the url-encoded format didn't anticipate data using character sets other than US-ASCII. There is a defacto standard, which consists of encoding a string as UTF-8, and escaping the resulting bytes as if they were encoded using US-ASCII. The above converts the string to UTF-8 bytes, which will be subsequently escaped by POST's guts.	[reply] [d/l] [select]
Re^3: HTTP::Request::Common::POST and UTF-8 by scollyer (Sexton) on Sep 28, 2005 at 18:48 UTC
>Won't that escape data twice? Yup, just discovered that. Your solution appears to work correctly, with the corresponding unescaping being: `decode("UTF-8", uri_unescape($req_string))` [download] Thanks for this. I think I'll go and hit myself with a stick now. It'll be less pain than doing UTF-8 in Perl ... Steve Collyer	[reply] [d/l]