Re^2: Somthing related to encoding ??

Replies are listed 'Best First'.
Re^3: Somthing related to encoding ?? by Joost (Canon) on Aug 10, 2005 at 16:40 UTC
To expand a little on what fishbot_v2 said; you need to make sure you text is in utf-8, and that perl knows that it's utf-8 encoded. You can use the Encode module to do that, after that ord() will work as you expect it. Also note that if you just want a valid HTML output, and your text is already utf-8 encoded, you can specify the utf-8 charset in your html page and then you don't need to use the `&#number;` encoding. This is also more efficient in terms of file-size. You can set the charset in your content-type header (either generate the content-type "text/html; charset=utf-8" from a script or configure the webserver to send that content-type for static html files), or you can use the following meta-tag in your HTML head: `<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">` [download] Ps/update: if you want to work with unicode / utf-8 text, you should use a recent perl version (5.8.0 or higher if you can get it); unicode handling has improved a lot in the last releases. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l] [select]
Re^4: Somthing related to encoding ?? by Anonymous Monk on Aug 10, 2005 at 17:59 UTC
Thanks for your reply , and i have tryed what you said with no chance :S still getting the same thing This is what i've tryed `#!/perl/bin/perl -w print "Content-type: text/html\n\n"; use strict; use Encode; my $string = encode("utf-8","أخي الزائر،"); my $encoded = join '',map { '&#' . ord($_) . ';'} split '',$string; print("$encoded\n");` [download] is there anything i've forgot ? or missed ?? Thanks for your help :)	[reply] [d/l]
Re^5: Somthing related to encoding ?? by fishbot_v2 (Chaplain) on Aug 10, 2005 at 19:07 UTC
In your code, how does Encode know the origin encoding? Try: `use strict; use warnings; use Encode; my $string = decode( "iso-8859-6", "أخي الزائر،" ); # see note my $encoded = join '', map { "&#$_;" } unpack 'U', $string; print "$encoded\n"; __END__ أخٍ افز ائر�` [download] Not sure what happens there in the last character, but the rest is consistent with your desired output. It is likely paste-related at my end, though. Note:* decode converts to utf8, and sets the utf8 flag if there are codepoints above 127. So, `split //` should then work on the string's characters, rather than bytes. Not sure that any of that works on perl before 5.8.	[reply] [d/l] [select]
Re^3: Somthing related to encoding ?? by Anonymous Monk on Aug 10, 2005 at 15:42 UTC
Sorry , i forgot to add code tages to give you the proof of output This is what i got from that site : `أخي الزائر،` and this is what i get when i try to make it myself : `ÃÎí ÇáÒÇÆÑ¡` the output is extremly deferent	[reply] [d/l] [select]
Re^4: Somthing related to encoding ?? by fishbot_v2 (Chaplain) on Aug 10, 2005 at 16:14 UTC
It looks like your string encoding in the local case is non-UTF8. You have the same number of characters (11) in both case, the first is the UTF8 entity encoding, the second is a mapping of your local string encoding to their codepoints (looks like iso-8859-6). See Encode for decoding to UTF8, then use `unpack 'U*'` or `ord()` on the UTF8 string.	[reply] [d/l] [select]