Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Somthing related to encoding ??
by ikegami (Patriarch) on Aug 10, 2005 at 14:02 UTC
    Is chr(nnn) what you want? It takes the number of an ASCII or UNICODE number and returns the string containing just that character.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Somthing related to encoding ??
by fishbot_v2 (Chaplain) on Aug 10, 2005 at 14:31 UTC

    Do you mean that you have plaintext, and want to entize everything? Like so:

    my $str = "perlmonks"; print map { "&#$_;" } unpack 'C*', $str; __END__ perlmonks

    Not sure what "i won't use ord" means, but this would seem to satisfy that odd requirement. Assuming your "NORMAL" == ASCII, then this works. If you have unicode, then you need 'U*' as your unpack template.

    Update: Your response to the post above suggests that you don't want this. You possibly want the reverse. Incidentally, the word "normal" is meaningless when it comes to data encoding.

Re: Somthing related to encoding ??
by Joost (Canon) on Aug 10, 2005 at 14:38 UTC

      hi again

      First of all it's not a homework and thanks for the link of how not to ask , but i reallrd need to find an answer i'm trying to get something giving a proper output with arabic language i tryed with "ord" function and it didn't work therefore i'm asking for another way

      by the way i tryed "unpack 'C*'" and it works just like ord function

      i found an example of the output i want http://www.ycic.com/arabic/home.htm if you opened the source code of the page you'll see something like أخي الزائر، but when i try to do the same thing i get ÃÎí ÇáÒÇÆÑ¡ i hope you understand what i mean now ??? Thanks

        To expand a little on what fishbot_v2 said; you need to make sure you text is in utf-8, and that perl knows that it's utf-8 encoded. You can use the Encode module to do that, after that ord() will work as you expect it.

        Also note that if you just want a valid HTML output, and your text is already utf-8 encoded, you can specify the utf-8 charset in your html page and then you don't need to use the &#number; encoding. This is also more efficient in terms of file-size.

        You can set the charset in your content-type header (either generate the content-type "text/html; charset=utf-8" from a script or configure the webserver to send that content-type for static html files), or you can use the following meta-tag in your HTML head:

        <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">

        Ps/update: if you want to work with unicode / utf-8 text, you should use a recent perl version (5.8.0 or higher if you can get it); unicode handling has improved a lot in the last releases.

        Sorry , i forgot to add code tages to give you the proof of output

        This is what i got from that site : &#1571;&#1582;&#1610; &#1575;&#1604;&#1586;&#1575;&#1574;&#1585;&#1548; and this is what i get when i try to make it myself : &#195;&#206;&#237;&#32;&#199;&#225;&#210;&#199;&#198;&#209;&#161; the output is extremly deferent

Re: Somthing related to encoding ??
by newroz (Monk) on Aug 10, 2005 at 15:44 UTC
    The folowing will convert the entities into unicode counterpart.
    $dec_text="&#1575;&#1604;&#1605;&#1581;&#1578;&#1604;&#1577;" $dec_text =~s/(\&\#(\d+)\;)/pack("U*",$2)/eg;

      Thanks for your reply

      but i don't want to "pack" i want to "unpack" the word to get something like :

      &#1575;&#1604;&#1605;&#1581;&#1578;&#1604;&#1577;

        Sorry, for misunderstood. I had replied in a hurry.
        An alternative approach for other case.Cast these ones.
        perl -e 'my $str="czd"; my $s=join " ", map { sprintf "&#%d;", $_ } un +pack("U*",$str); print my $s;'
        or
         perl -e 'my $str="عفاريت"; my $s=join " ", map { sprintf "&#%d;", $_ } unpack("U*",$str); print my $s;'
        
Re: Somthing related to encoding ??
by wfsp (Abbot) on Aug 10, 2005 at 17:07 UTC
    Here's my take on it.

    You can use ord but you need to use it on characters not bytes. Unicode can contain multi byte characters and you have to take account of that.

    The code below uses HTML::Entities::decode_entities which returns Unicode.

    #!/usr/bin/perl use strict; use warnings; use HTML::Entities; my $html_ents = q|&#1571;&#1582;&#1610;&#1575;&#1604;&#1586;&#1575;&#1 +574;&#1585;&#1548;|; my $unicode = decode_entities($html_ents); $unicode =~s/(.)/sprintf("%d ", ord $1)/eg; print "html entities: $html_ents\n"; print "ord on chars: $unicode\n";
    Outputs:
    html entities: &#1571;&#1582;&#1610; &#1575;&#1604;&#1586;&#1575;&#157 +4;&#1585;&#1548; ord on chars: 1571 1582 1610 32 1575 1604 1586 1575 1574 1585 1548

    Hopes this sheds some light on the subject.