encode charset

alkis has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: encode charset by moritz (Cardinal) on Mar 06, 2008 at 14:24 UTC
Note that utf8 and utf-8 are not the same thing, normally you deal with utf-8. You have to take a few steps to get utf-8 working: 1) You need to set up STDOUT: `binmode STDOUT, ':encoding(UTF-8)';` 2) Send a header with the correct charset: `print "Content-Type: text/html charset=utf-8\n\n";` 3) ensure that your HTML doesn't contain a http-equiv meta tag with a different charset. To test it, open your page with Firefox, press Ctrl+I, and check which encoding it thinks the page is in. And make sure to read perluniintro, perluniintro and perlunifaq.	[reply] [d/l] [select]
Re^2: encode charset by Akoya (Scribe) on Mar 06, 2008 at 21:48 UTC
In order to inform your browser that you are sending utf8 characters, add this to your head section: `<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">` [download]	[reply] [d/l]
Re^3: encode charset by moritz (Cardinal) on Mar 07, 2008 at 08:24 UTC
No! There's no reason to put that in the HTML when you send a correct header. You'll only get problems when you change the charset, and you forget to adjust it in both places. The meta header can only be parsed if the browser already guessed a somewhat compatible encoding, and is an ugly hack for hosting services where you can't influence the header. But if you're using CGI anyway, you can just as well use the right solution, not an ugly hack.	[reply]
Re: encode charset by alkis (Acolyte) on Mar 06, 2008 at 17:36 UTC
the data files can only recognize greek characters in iso-8859-7, and not utf8, while the broswer recognize greek characters only in utf8!!!! how can i do the matching between one word that is gien by the user (through the broswer) and the data file??? i tried all the combinations that you told me, i dont have problem with the english characters, only with the greek!!	[reply]
Re^2: encode charset by almut (Canon) on Mar 06, 2008 at 18:54 UTC
Tell Perl that your data files are in iso-8859-7 when you read them in (i.e. open them with `"<:encoding(iso-8859-7)"`), and tell Perl that your input received from the browser via CGI is in utf-8 (`my $word = Encode::decode_utf8(param('word'));`, or similar — in case it's not already flagged as utf8 (which is hard to tell without seeing the code) ). This results in both sides of what you want to match to be decoded into Perl character strings, which will handle wide characters correctly. (If you can't get it to work, please show the actual code you've tried; that makes it easier to help...)	[reply] [d/l] [select]