alkis has asked for the wisdom of the Perl Monks concerning the following question:

hello monks, i am running a cgi perl script in order to search through data files. these files use encoding utf-8. example:
open(FILE,"<:encoding(utf8)","$FILE"); @LINES = <FILE>; close(FILE);
the results appear in firefox browser but in iso-8859-1 and they seem like this: το μεσημέρι στο κέντρο της and as a result the search is not correct, how can i change the encoding in the results? thanks

Replies are listed 'Best First'.
Re: encode charset
by moritz (Cardinal) on Mar 06, 2008 at 14:24 UTC
    Note that utf8 and utf-8 are not the same thing, normally you deal with utf-8.

    You have to take a few steps to get utf-8 working:

    1) You need to set up STDOUT: binmode STDOUT, ':encoding(UTF-8)';

    2) Send a header with the correct charset: print "Content-Type: text/html charset=utf-8\n\n";

    3) ensure that your HTML doesn't contain a http-equiv meta tag with a different charset.

    To test it, open your page with Firefox, press Ctrl+I, and check which encoding it thinks the page is in.

    And make sure to read perluniintro, perluniintro and perlunifaq.

      In order to inform your browser that you are sending utf8 characters, add this to your head section:
      <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
        No! There's no reason to put that in the HTML when you send a correct header.

        You'll only get problems when you change the charset, and you forget to adjust it in both places.

        The meta header can only be parsed if the browser already guessed a somewhat compatible encoding, and is an ugly hack for hosting services where you can't influence the header.

        But if you're using CGI anyway, you can just as well use the right solution, not an ugly hack.

Re: encode charset
by alkis (Acolyte) on Mar 06, 2008 at 17:36 UTC
    the data files can only recognize greek characters in iso-8859-7, and not utf8, while the broswer recognize greek characters only in utf8!!!! how can i do the matching between one word that is gien by the user (through the broswer) and the data file??? i tried all the combinations that you told me, i dont have problem with the english characters, only with the greek!!

      Tell Perl that your data files are in iso-8859-7 when you read them in (i.e. open them with "<:encoding(iso-8859-7)"), and tell Perl that your input received from the browser via CGI is in utf-8 (my $word = Encode::decode_utf8(param('word'));, or similar — in case it's not already flagged as utf8 (which is hard to tell without seeing the code) ). This results in both sides of what you want to match to be decoded into Perl character strings, which will handle wide characters correctly.

      (If you can't get it to work, please show the actual code you've tried; that makes it easier to help...)