in reply to [SOLVED] How do I convince my Perl script that UTF-8 from an HTML form really is UTF-8?

Most of the steps you did were fine, so just a few remarks:

I don't see some other steps in your script:

About diagnostics: Windows Codepage 1252 and ISO8859-1 are different, but quite similar, so there's no way to distinguish between those two in this case.

  • Comment on Re: How do I convince my Perl script that UTF-8 from an HTML form really is UTF-8?

Replies are listed 'Best First'.
Re^2: How do I convince my Perl script that UTF-8 from an HTML form really is UTF-8?
by Cody Fendant (Hermit) on Mar 11, 2020 at 19:28 UTC
    As of today, static pages can declare their own encoding in a meta element, e.g. <meta charset="utf-8">

    This is what I mean when I said I declared the charset in the HTML

    If you write UTF-8 in the response CGI script, you ought to print "Content-type: text/plain; charset=utf-8\n\n":

    The server had already added the UTF-8 content type in the HTTP headers

    binmode STDOUT, ":encoding(UTF-8)";

    I had done this but forgot to include it in the code snippet. It still didn't work.

      The server had already added the UTF-8 content type in the HTTP headers

      Could you please elaborate which server you are using, and how you configure it to modify the content type of a CGI script? You could also verify in your browser which encoding it uses for your text/plain response.

      I also had a look at the source of CGI::Simple and found out:

      • The module will decode parameters for you only if you set the global variable $CGI::Simple::PARAM_UTF8 to a true value. That's sort of difficult to guess, since it isn't documented. Of course, you can decode yourself, but it looks like you didn't.
      • The module will add ; charset=utf-8 to the content type header only if you print it as print $q->header(-type => 'text/plain');, but not if you just print "Content-type: text/plain\n\n";.
      So, the following just works for me:
      use strict; use warnings; use CGI::Simple; $CGI::Simple::PARAM_UTF8 = 1; my $q = CGI::Simple->new(); $q->charset('utf-8'); binmode STDOUT,':encoding(UTF-8)'; print $q->header(-type => 'text/plain'); print $q->param('text'),"\n";