http://qs1969.pair.com?node_id=11135500

vr has asked for the wisdom of the Perl Monks concerning the following question:

I'm afraid this question is OT -- not really Perl, but web basics.

With WWW::Mechanize, I could get the form from page with e.g. form_name, then just set required charset (accept_charset) and then it all "automatically" worked as expected: set input fields text values (field) using usual "normal" (Unicode) Perl strings, click submit button, and job's done.

Now I'm moving from WWW::Mechanize to Net::Async::HTTP, its POST method accepts array reference of "field_name" => "value" pairs. Here I tried different combinations of encoding text field values (or not encoding at all) and also providing things like

content_type => 'application/x-www-form-urlencoded; charset=utf-8',

as additional %args to POST, but to no avail: reading back what I posted, from server, I'm getting different sorts of garbage, double-encoded text from what I guess.

How to POST text with correct charset? I suspect the answer to this is very simple and obvious to anyone with basic knowledge of web technologies, but not to me, alas.

5 min edit: the charset to use is actually "koi8-r"; that's what server expects, what's working with WWW:Mechanize, and what I'm trying (but no success) with Net::Async::HTTP in the 1st place.

Replies are listed 'Best First'.
Re: How to set charset when POSTing?
by vr (Curate) on Jul 30, 2021 at 16:27 UTC

    Found it. Previous WWW::Mechanize solution works OK because forms, there, are encoded wholesale, each and every name and value, regardless. See 455-457.

    With new Net::Async::HTTP code, I was trying to be smart, only encoding those text values which obviously require to be encoded. However, another value, a string, simple uid in hexadecimal, sneaked in, which happens to be utf8::upgraded. Then URI module (used by HTTP::Request::Common) produces the whole url-encoded form data in utf-8, as I understand. I don't know if it's expected and documented behaviour, but it's not what server, processing my forms, understands. So solution is to always loop through and encode everything, as in 455-457 lines mentioned.

    use strict; use warnings; use utf8; use feature 'say'; use URI; use Encode 'encode'; use charnames 'cyrillic'; my $uri = URI-> new( 'http:' ); my $ascii_uid_str = 'abc'; # whatever ascii my $high_ascii_octets = encode( 'koi8-r', qq(\N{zhe})); # = chr 214; # (same as above) # whatever, too, but high-ascii $uri-> query_form( $ascii_uid_str, $high_ascii_octets ); say $uri-> query; utf8::upgrade( $ascii_uid_str ); # oops, unexpected $uri-> query_form( $ascii_uid_str, $high_ascii_octets ); say $uri-> query; __END__ abc=%D6 abc=%C3%96
Re: How to set charset when POSTing?
by perlfan (Vicar) on Jul 29, 2021 at 21:35 UTC
    If it needs to be an actual header, then you need to use javascript to handle the POST action onClick of a <input type=button ...>. Otherwise, you can also send it as a <input type=hidden ...> then just pull it out of the POST params to know what charset can be accepted.