koszta5 has asked for the wisdom of the Perl Monks concerning the following question:

I have a simple web page that uses CGI.pm This is what I do:

when I call any perl CGI.pm function and use czech character "ě" for value of a textfield, label of radio_group or anything else I get �› insetad of "ě"

this is extremly weird - since the whole page is utf8 (<meta name="charset" content="utf-8"/> ). Especially since this works

$textfield_value = "ěěěěě";
print '<textfield value="'.$textfield_value.'" >';

therefore I am positive - it has to be CGI.pm causing the problem... I tried to put

use utf8; utf8::decode($textfield_value);

at the beginning of my scirpt and it fixed the CGI.pm problem but made all other characters in the script (those that are regulary printed) look funny..

Any ideas??? note: it happens to me here as well even as I edit this question -> what i previously wrote as ěěě is in edit window displayed as &#283;. The SAME thing with the Name of this thread is should be CGI.pm encoding - wrong encoding for ě and not CGI.pm encoding - wrong encoding for &#283;

Replies are listed 'Best First'.
Re: CGI.pm encoding - wrong encoding for &#283;
by choroba (Cardinal) on Jan 25, 2012 at 12:43 UTC
      sry, agreed.... wont do it again... sorry for a noob practice
Re: CGI.pm encoding - wrong encoding for &#283;
by Anonymous Monk on Jan 25, 2012 at 12:02 UTC
    use utf8;
    use CGI qw();
    use IO::File qw();
    my $c = CGI->new;
    STDOUT->binmode(':encoding(UTF-8)');
    STDOUT->print($c->header('text/html;charset=UTF-8'));
    STDOUT->print($c->textfield('ě'));

      it works for this particular file but not for my complex script

      .. How can I force CGI to use utf8 as default?

      And how is it possible that all other national chars (ščřžýáíéůú) are good

        it works for this particular file but not for my complex script

        I do not understand. What is a complex script? What does not work? Explain this, use more words.

        How can I force CGI to use utf8 as default?

        Exactly as I have shown. There are several parts to it:

        1. use utf8; lets you write literal UTF-8 characters in Perl source code. If you always use Perl escapes instead, such as \x{11b} and HTML escapes, such as &#283;, so that you have ASCII-only source-code, this pragma is not neccesary.

        2. The header method sets the HTTP header with the appropriate encoding. You seem to be under the misconception that <meta name="charset" content="utf-8"/> is enough. In fact the HTTP header is always relevant, and the in-file meta header only is taken into consideration when the HTTP header is absent, e.g. when the HTML is loaded from the file system.

        3. CGI writes to STDOUT. The binmode method inherited from IO::Handle sets the encoding to UTF-8 for any STDOUT output, e.g. print method.

        And how is it possible that all other national chars (ščřžýáíéůú) are good

        I do not understand this question. All characters have the same properties, or nearly so, so of course they behave the same as with ě from the original example. Perhaps you should start to show some source code to demonstrate what you mean.