in reply to HTTP::Request::Common::POST and UTF-8

I'm not sure if this will fit your needs, but here's one possible solution. I added one new line to your code and modified another, both marked with ### <--.
#!/usr/bin/perl use strict; use warnings; use LWP; use HTTP::Request::Common; use Encode; use charnames qw(greek); use URI::Escape qw(uri_escape_utf8); ### <-- new line binmode(STDOUT, ":utf8"); my $utf8_data = "<\N{alpha}\N{beta}\N{gamma}\N{delta}>"; print $utf8_data, "\n\n"; print Encode::is_utf8($utf8_data) ? "\$utf8_data marked as UTF-8\n\n" : "\$utf8_data not marked as UTF-8\n\n"; my $request = POST("http://localhost/test", Content => [ data => uri_escape_utf8($utf8_dat +a), ### <-- modified line more_data => "some more data", ] ); my $req_string = $request->as_string(); print Encode::is_utf8($req_string) ? "\$req_string marked as UTF-8\n\n" : "\$req_string not marked as UTF-8\n\n"; print $req_string, "\n";

Replies are listed 'Best First'.
Re^2: HTTP::Request::Common::POST and UTF-8
by ikegami (Patriarch) on Sep 28, 2005 at 16:57 UTC

    Won't that escape data twice? Without actually running it, it looks like
    "\x{1234}"
    would be transformed by uri_escape_utf8 into
    "%C8%B4"
    which would be transformed by POST into
    "%25C8%25B4"
    while the right answer would be
    "%C8%B4"

    What he actually needs is

    my $request = POST( "http://localhost/test", Content => [ data => encode("UTF-8", $utf8_data), more_data => "some more data", ] );

    The core problem is that the url-encoded format didn't anticipate data using character sets other than US-ASCII. There is a defacto standard, which consists of encoding a string as UTF-8, and escaping the resulting bytes as if they were encoded using US-ASCII. The above converts the string to UTF-8 bytes, which will be subsequently escaped by POST's guts.

      >Won't that escape data twice?

      Yup, just discovered that. Your solution appears to work correctly, with the corresponding unescaping being:

      decode("UTF-8", uri_unescape($req_string))
      Thanks for this.

      I think I'll go and hit myself with a stick now. It'll be less pain than doing UTF-8 in Perl ...

      Steve Collyer

Re^2: HTTP::Request::Common::POST and UTF-8
by scollyer (Sexton) on Sep 28, 2005 at 15:32 UTC
    > I'm not sure if this will fit your needs, but here's one
    > possible solution.

    Thanks very much. That looks a lot better. (I needed to upgrade to a new version of URI::Escape though). I guess it might be nice if you could override the default escape routine inside POST, rather than doing it manually.

    It's also suitable for the real code, too.

    Steve Collyer