in reply to CGI.pm encoding - wrong encoding for ě

use utf8;
use CGI qw();
use IO::File qw();
my $c = CGI->new;
STDOUT->binmode(':encoding(UTF-8)');
STDOUT->print($c->header('text/html;charset=UTF-8'));
STDOUT->print($c->textfield('ě'));
  • Comment on Re: CGI.pm encoding - wrong encoding for ě

Replies are listed 'Best First'.
Re^2: CGI.pm encoding - wrong encoding for ě
by koszta5 (Initiate) on Jan 25, 2012 at 13:08 UTC

    it works for this particular file but not for my complex script

    .. How can I force CGI to use utf8 as default?

    And how is it possible that all other national chars (ščřžýáíéůú) are good

      it works for this particular file but not for my complex script

      I do not understand. What is a complex script? What does not work? Explain this, use more words.

      How can I force CGI to use utf8 as default?

      Exactly as I have shown. There are several parts to it:

      1. use utf8; lets you write literal UTF-8 characters in Perl source code. If you always use Perl escapes instead, such as \x{11b} and HTML escapes, such as ě, so that you have ASCII-only source-code, this pragma is not neccesary.

      2. The header method sets the HTTP header with the appropriate encoding. You seem to be under the misconception that <meta name="charset" content="utf-8"/> is enough. In fact the HTTP header is always relevant, and the in-file meta header only is taken into consideration when the HTTP header is absent, e.g. when the HTML is loaded from the file system.

      3. CGI writes to STDOUT. The binmode method inherited from IO::Handle sets the encoding to UTF-8 for any STDOUT output, e.g. print method.

      And how is it possible that all other national chars (ščřžýáíéůú) are good

      I do not understand this question. All characters have the same properties, or nearly so, so of course they behave the same as with ě from the original example. Perhaps you should start to show some source code to demonstrate what you mean.

        You seem to be under the misconception that <meta name="charset" content="utf-8"/> is enough.

        It is not only "not enough"; it is "not anything". That line is just a total nonsense. No browser at all will pay any attention to it whatsoever.

        Either of the lines below should be detected by a browser though.

        <!-- HTML 4.x style --> <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
        <!-- HTML5 style --> <meta charset="utf-8">

        well I tried all that - but really... lets talk about the last issue since that is what is bugging me the most... I think it has the solution to the whole problem

        What I mean by all other national chars (ščřžýáíéůú) are good is that when I call a CGI.pm function - lets say:

        textfield({-value=>'š&#269;&#345;žýáíé&#367;ú'});

        It displays OK, without any changes to my current code. It is the only one national letter (ě) for which this problem occurs... And I am asking why?!

        ok,

        use utf8;

        fixes the problem with cgi generated fields... BUT

        When I select data that has ANY national characters from the DB and print them THEN the output of these chars is messed up. SO I can either have messed up cgi fields (without using use utf8; OR just regular print from db selected variables (with use utf8;)

        Thanks Corion - that will be it.. my DB is

        CHARSET=utf8 COLLATE=utf8_unicode_ci

        I thought there would be no problem... how should I decode?