Re: Somthing related to encoding ??
by ikegami (Patriarch) on Aug 10, 2005 at 14:02 UTC
|
Is chr(nnn) what you want? It takes the number of an ASCII or UNICODE number and returns the string containing just that character. | [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: Somthing related to encoding ??
by fishbot_v2 (Chaplain) on Aug 10, 2005 at 14:31 UTC
|
Do you mean that you have plaintext, and want to entize everything? Like so:
my $str = "perlmonks";
print map { "&#$_;" } unpack 'C*', $str;
__END__
perlmonks
Not sure what "i won't use ord" means, but this would seem to satisfy that odd requirement. Assuming your "NORMAL" == ASCII, then this works. If you have unicode, then you need 'U*' as your unpack template.
Update: Your response to the post above suggests that you don't want this. You possibly want the reverse. Incidentally, the word "normal" is meaningless when it comes to data encoding. | [reply] [d/l] [select] |
Re: Somthing related to encoding ??
by Joost (Canon) on Aug 10, 2005 at 14:38 UTC
|
| [reply] |
|
|
hi again
First of all it's not a homework and thanks for the link of how not to ask , but i reallrd need to find an answer i'm trying to get something giving a proper output with arabic language i tryed with "ord" function and it didn't work therefore i'm asking for another way
by the way i tryed "unpack 'C*'" and it works just like ord function
i found an example of the output i want http://www.ycic.com/arabic/home.htm if you opened the source code of the page you'll see something like
أخي الزائر،
but when i try to do the same thing i get
ÃÎí ÇáÒÇÆÑ¡
i hope you understand what i mean now ???
Thanks
| [reply] |
|
|
To expand a little on what fishbot_v2 said; you need to make sure you text is in utf-8, and that perl knows that it's utf-8 encoded. You can use the Encode module to do that, after that ord() will work as you expect it.
Also note that if you just want a valid HTML output, and your text is already utf-8 encoded, you can specify the utf-8 charset in your html page and then you don't need to use the &#number; encoding. This is also more efficient in terms of file-size.
You can set the charset in your content-type header (either generate the content-type "text/html; charset=utf-8" from a script or configure the webserver to send that content-type for static html files), or you can use the following meta-tag in your HTML head:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
Ps/update: if you want to work with unicode / utf-8 text, you should use a recent perl version (5.8.0 or higher if you can get it); unicode handling has improved a lot in the last releases.
| [reply] [d/l] [select] |
|
|
|
|
|
|
Sorry , i forgot to add code tages to give you the proof of output
This is what i got from that site :
أخي الزائر،
and this is what i get when i try to make it myself :
ÃÎí ÇáÒÇÆÑ¡
the output is extremly deferent
| [reply] [d/l] [select] |
|
|
Re: Somthing related to encoding ??
by newroz (Monk) on Aug 10, 2005 at 15:44 UTC
|
The folowing will convert the entities into unicode counterpart.
$dec_text="المحتلة"
$dec_text =~s/(\&\#(\d+)\;)/pack("U*",$2)/eg;
| [reply] [d/l] |
|
|
| [reply] [d/l] |
|
|
Sorry, for misunderstood. I had replied in a hurry.
An alternative approach for other case.Cast these ones.
perl -e 'my $str="czd"; my $s=join " ", map { sprintf "&#%d;", $_ } un
+pack("U*",$str); print my $s;'
or
perl -e 'my $str="عفاريت"; my $s=join " ", map { sprintf "%d;", $_ } unpack("U*",$str); print my $s;'
| [reply] [d/l] |
Re: Somthing related to encoding ??
by wfsp (Abbot) on Aug 10, 2005 at 17:07 UTC
|
Here's my take on it.
You can use ord but you need to use it on characters not bytes. Unicode can contain multi byte characters and you have to take account of that.
The code below uses HTML::Entities::decode_entities which returns Unicode.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::Entities;
my $html_ents = q|أخيالزا
+574;ر،|;
my $unicode = decode_entities($html_ents);
$unicode =~s/(.)/sprintf("%d ", ord $1)/eg;
print "html entities: $html_ents\n";
print "ord on chars: $unicode\n";
Outputs:
html entities: أخي الزا
+4;ر،
ord on chars: 1571 1582 1610 32 1575 1604 1586 1575 1574 1585 1548
Hopes this sheds some light on the subject.
| [reply] [d/l] [select] |