Somthing related to encoding ??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Somthing related to encoding ?? by ikegami (Patriarch) on Aug 10, 2005 at 14:02 UTC
Is `chr(nnn)` what you want? It takes the number of an ASCII or UNICODE number and returns the string containing just that character.	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Somthing related to encoding ?? by fishbot_v2 (Chaplain) on Aug 10, 2005 at 14:31 UTC
Do you mean that you have plaintext, and want to entize everything? Like so: `my $str = "perlmonks"; print map { "&#$_;" } unpack 'C', $str; __END__ perlmonks` [download] Not sure what "i won't use ord"* means, but this would seem to satisfy that odd requirement. Assuming your "NORMAL" == ASCII, then this works. If you have unicode, then you need `'U'` as your `unpack` template. Update:* Your response to the post above suggests that you don't want this. You possibly want the reverse. Incidentally, the word "normal" is meaningless when it comes to data encoding.	[reply] [d/l] [select]
Re: Somthing related to encoding ?? by Joost (Canon) on Aug 10, 2005 at 14:38 UTC
i won't use ord is that a homework restriction or do you have a good reason not to? Please read how (not) to ask a question. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re^2: Somthing related to encoding ?? by Anonymous Monk on Aug 10, 2005 at 15:35 UTC
hi again First of all it's not a homework and thanks for the link of how not to ask , but i reallrd need to find an answer i'm trying to get something giving a proper output with arabic language i tryed with "ord" function and it didn't work therefore i'm asking for another way by the way i tryed "unpack 'C*'" and it works just like ord function i found an example of the output i want http://www.ycic.com/arabic/home.htm if you opened the source code of the page you'll see something like أخي الزائر، but when i try to do the same thing i get ÃÎí ÇáÒÇÆÑ¡ i hope you understand what i mean now ??? Thanks	[reply]
Re^3: Somthing related to encoding ?? by Joost (Canon) on Aug 10, 2005 at 16:40 UTC
To expand a little on what fishbot_v2 said; you need to make sure you text is in utf-8, and that perl knows that it's utf-8 encoded. You can use the Encode module to do that, after that ord() will work as you expect it. Also note that if you just want a valid HTML output, and your text is already utf-8 encoded, you can specify the utf-8 charset in your html page and then you don't need to use the `&#number;` encoding. This is also more efficient in terms of file-size. You can set the charset in your content-type header (either generate the content-type "text/html; charset=utf-8" from a script or configure the webserver to send that content-type for static html files), or you can use the following meta-tag in your HTML head: `<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">` [download] Ps/update: if you want to work with unicode / utf-8 text, you should use a recent perl version (5.8.0 or higher if you can get it); unicode handling has improved a lot in the last releases. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l] [select]
Re^4: Somthing related to encoding ?? by Anonymous Monk on Aug 10, 2005 at 17:59 UTC
Re^5: Somthing related to encoding ?? by fishbot_v2 (Chaplain) on Aug 10, 2005 at 19:07 UTC
Re^3: Somthing related to encoding ?? by Anonymous Monk on Aug 10, 2005 at 15:42 UTC
Sorry , i forgot to add code tages to give you the proof of output This is what i got from that site : `أخي الزائر،` and this is what i get when i try to make it myself : `ÃÎí ÇáÒÇÆÑ¡` the output is extremly deferent	[reply] [d/l] [select]
Re^4: Somthing related to encoding ?? by fishbot_v2 (Chaplain) on Aug 10, 2005 at 16:14 UTC
Re: Somthing related to encoding ?? by newroz (Monk) on Aug 10, 2005 at 15:44 UTC
The folowing will convert the entities into unicode counterpart. `$dec_text="المحتلة" $dec_text =~s/(\&\#(\d+)\;)/pack("U*",$2)/eg;` [download]	[reply] [d/l]
Re^2: Somthing related to encoding ?? by Anonymous Monk on Aug 10, 2005 at 16:12 UTC
Thanks for your reply but i don't want to "pack" i want to "unpack" the word to get something like : `المحتلة`	[reply] [d/l]
Re^3: Somthing related to encoding ?? by newroz (Monk) on Aug 11, 2005 at 13:28 UTC
Sorry, for misunderstood. I had replied in a hurry. An alternative approach for other case.Cast these ones. `perl -e 'my $str="czd"; my $s=join " ", map { sprintf "&#%d;", $_ } un +pack("U",$str); print my $s;'` [download] or perl -e 'my $str="عفاريت"; my $s=join " ", map { sprintf "&#%d;", $_ } unpack("U",$str); print my $s;'	[reply] [d/l]
Re: Somthing related to encoding ?? by wfsp (Abbot) on Aug 10, 2005 at 17:07 UTC
Here's my take on it. You can use ord but you need to use it on characters not bytes. Unicode can contain multi byte characters and you have to take account of that. The code below uses HTML::Entities::decode_entities which returns Unicode. `#!/usr/bin/perl use strict; use warnings; use HTML::Entities; my $html_ents = q\|أخيالزا&#1 +574;ر،\|; my $unicode = decode_entities($html_ents); $unicode =~s/(.)/sprintf("%d ", ord $1)/eg; print "html entities: $html_ents\n"; print "ord on chars: $unicode\n";` [download] Outputs: `html entities: أخي الزا&#157 +4;ر، ord on chars: 1571 1582 1610 32 1575 1604 1586 1575 1574 1585 1548` [download] Hopes this sheds some light on the subject.	[reply] [d/l] [select]