in reply to Re^2: CGI.pm encoding - wrong encoding for ě
in thread CGI.pm encoding - wrong encoding for ě

it works for this particular file but not for my complex script

I do not understand. What is a complex script? What does not work? Explain this, use more words.

How can I force CGI to use utf8 as default?

Exactly as I have shown. There are several parts to it:

1. use utf8; lets you write literal UTF-8 characters in Perl source code. If you always use Perl escapes instead, such as \x{11b} and HTML escapes, such as ě, so that you have ASCII-only source-code, this pragma is not neccesary.

2. The header method sets the HTTP header with the appropriate encoding. You seem to be under the misconception that <meta name="charset" content="utf-8"/> is enough. In fact the HTTP header is always relevant, and the in-file meta header only is taken into consideration when the HTTP header is absent, e.g. when the HTML is loaded from the file system.

3. CGI writes to STDOUT. The binmode method inherited from IO::Handle sets the encoding to UTF-8 for any STDOUT output, e.g. print method.

And how is it possible that all other national chars (ščřžýáíéůú) are good

I do not understand this question. All characters have the same properties, or nearly so, so of course they behave the same as with ě from the original example. Perhaps you should start to show some source code to demonstrate what you mean.

  • Comment on Re^3: CGI.pm encoding - wrong encoding for &#283;

Replies are listed 'Best First'.
Re^4: CGI.pm encoding - wrong encoding for &#283;
by tobyink (Canon) on Jan 25, 2012 at 14:16 UTC

    You seem to be under the misconception that <meta name="charset" content="utf-8"/> is enough.

    It is not only "not enough"; it is "not anything". That line is just a total nonsense. No browser at all will pay any attention to it whatsoever.

    Either of the lines below should be detected by a browser though.

    <!-- HTML 4.x style --> <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <!-- HTML5 style --> <meta charset="utf-8">
Re^4: CGI.pm encoding - wrong encoding for &#283;
by koszta5 (Initiate) on Jan 25, 2012 at 14:24 UTC

    ok,

    use utf8;

    fixes the problem with cgi generated fields... BUT

    When I select data that has ANY national characters from the DB and print them THEN the output of these chars is messed up. SO I can either have messed up cgi fields (without using use utf8; OR just regular print from db selected variables (with use utf8;)

      You will need to learn what encoding is written to the database, and what encoding is returned from the database. Then you will need to Encode::decode from that encoding.

      You will need to repeat that process for all points in your code where you retrieve data from outside of the Perl interpreter, or where you hand off data to the outside of the Perl interpreter.

Re^4: CGI.pm encoding - wrong encoding for &#283;
by koszta5 (Initiate) on Jan 25, 2012 at 13:46 UTC

    well I tried all that - but really... lets talk about the last issue since that is what is bugging me the most... I think it has the solution to the whole problem

    What I mean by all other national chars (ščřžýáíéůú) are good is that when I call a CGI.pm function - lets say:

    textfield({-value=>'š&#269;&#345;žýáíé&#367;ú'});

    It displays OK, without any changes to my current code. It is the only one national letter (ě) for which this problem occurs... And I am asking why?!

      You did not post show your code, and I refuse to speculate. It could help you to completely read http://p3rl.org/UNI which teaches about the concept of encoding in Perl. When you come to a forum, you are supposed to listen to the advice, properly answer requests for more information, and not go off orthogonally in a wild goose chase just because it seems like a good idea to you. In other words, follow the established practices or you are soon on your own again.

      I'm not angry, I just do not want to waste my time as a volunteer. Note that the regulars here are experts at solving problems, but you are so unexperienced that you don't even know that cross-posting without marking the messages as crossposts is bad. Thus, you have to trust us that generally we make the right decisions. Now please kindly show your code to demonstrate the problem, or I'll eject myself from this thread.

Re^4: CGI.pm encoding - wrong encoding for &#283;
by koszta5 (Initiate) on Jan 25, 2012 at 14:45 UTC

    Thanks Corion - that will be it.. my DB is

    CHARSET=utf8 COLLATE=utf8_unicode_ci

    I thought there would be no problem... how should I decode?