in reply to Somthing related to encoding ??

i won't use ord

is that a homework restriction or do you have a good reason not to?

Please read how (not) to ask a question.

Replies are listed 'Best First'.
Re^2: Somthing related to encoding ??
by Anonymous Monk on Aug 10, 2005 at 15:35 UTC

    hi again

    First of all it's not a homework and thanks for the link of how not to ask , but i reallrd need to find an answer i'm trying to get something giving a proper output with arabic language i tryed with "ord" function and it didn't work therefore i'm asking for another way

    by the way i tryed "unpack 'C*'" and it works just like ord function

    i found an example of the output i want http://www.ycic.com/arabic/home.htm if you opened the source code of the page you'll see something like أخي الزائر، but when i try to do the same thing i get ÃÎí ÇáÒÇÆÑ¡ i hope you understand what i mean now ??? Thanks

      To expand a little on what fishbot_v2 said; you need to make sure you text is in utf-8, and that perl knows that it's utf-8 encoded. You can use the Encode module to do that, after that ord() will work as you expect it.

      Also note that if you just want a valid HTML output, and your text is already utf-8 encoded, you can specify the utf-8 charset in your html page and then you don't need to use the &#number; encoding. This is also more efficient in terms of file-size.

      You can set the charset in your content-type header (either generate the content-type "text/html; charset=utf-8" from a script or configure the webserver to send that content-type for static html files), or you can use the following meta-tag in your HTML head:

      <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">

      Ps/update: if you want to work with unicode / utf-8 text, you should use a recent perl version (5.8.0 or higher if you can get it); unicode handling has improved a lot in the last releases.

        Thanks for your reply , and i have tryed what you said

        with no chance :S still getting the same thing

        This is what i've tryed

        #!/perl/bin/perl -w print "Content-type: text/html\n\n"; use strict; use Encode; my $string = encode("utf-8","أخي الزائر،"); my $encoded = join '',map { '&#' . ord($_) . ';'} split '',$string; print("$encoded\n");

        is there anything i've forgot ? or missed ??

        Thanks for your help :)

      Sorry , i forgot to add code tages to give you the proof of output

      This is what i got from that site : &#1571;&#1582;&#1610; &#1575;&#1604;&#1586;&#1575;&#1574;&#1585;&#1548; and this is what i get when i try to make it myself : &#195;&#206;&#237;&#32;&#199;&#225;&#210;&#199;&#198;&#209;&#161; the output is extremly deferent

        It looks like your string encoding in the local case is non-UTF8. You have the same number of characters (11) in both case, the first is the UTF8 entity encoding, the second is a mapping of your local string encoding to their codepoints (looks like iso-8859-6).

        See Encode for decoding to UTF8, then use unpack 'U*' or ord() on the UTF8 string.