Re: Arabic to Hex and Hex to Arabic

G'day thanos1983,

You can get the code points, without firing up the regex engine, like this:

$ perl -Mutf8 -C -E 'my $x = "ﻟﻠﺒﻴﻊ"; say sprintf "%x", ord substr $x, $_, 1 for 0 .. length($x) - 1'
fedf
fee0
fe92
fef4
feca

I don't speak, read or write Arabic; however, checking against Unicode's (PDF) code chart "Arabic Presentation Forms-B", these certainly appear correct.

You asked about getting a "0x" prefix. You can do that with sprintf by changing "%x" to "%#x".

$ perl -Mutf8 -C -E 'my $x = "ﻟﻠﺒﻴﻊ"; say sprintf "%#x", ord substr $x, $_, 1 for 0 .. length($x) - 1'
0xfedf
0xfee0
0xfe92
0xfef4
0xfeca

I don't know anything about UCS, so I might be missing something here. The output you show under "UCS-2", is just the code points, from my first one-liner, as pairs of hex digits (which, obviously, you could get with substr - still not needing a regex).

I accidentally generated what you show as "UTF-8" output, when I initially wrote that first one-liner, because I forgot to add the utf8 pragma.

$ perl -C -E 'my $x = "ﻟﻠﺒﻴﻊ"; say sprintf "%x", ord substr $x, $_, 1 for 0 .. length($x) - 1'
ef
bb
9f
ef
bb
a0
ef
ba
92
ef
bb
b4
ef
bb
8a

Anyway, knowing neither Arabic nor UCS, I don't want to draw any inferences from that output. It might, however, provide you with some insights.

The second part of your title was "... Hex to Arabic". Just printing the hex output I first got, gives me the original Arabic string.

$ perl -C -E 'say "\x{fedf}\x{fee0}\x{fe92}\x{fef4}\x{feca}"'
ﻟﻠﺒﻴﻊ

P.S. I'm using 5.26.0.

— Ken

Comment on Re: Arabic to Hex and Hex to Arabic Select or Download Code

Replies are listed 'Best First'.
Re^2: Arabic to Hex and Hex to Arabic by afoken (Chancellor) on Jul 29, 2017 at 10:58 UTC
I don't know anything about UCS UCS is essentially a legacy set of encodings for Unicode. UCS-2 is a two byte encoding, UCS-4 uses four bytes. UCS-2 is very similar to UTF-16, except that only characters in the BMP are allowed. UCS-2 has no concept of surrogates. You can read UCS-2 like you would read UTF-16. And if you write UTF-16 without surrogates, you also have written UCS-2. UTF-16 with surrogates is not compatible with UCS-2. UCS-4 is very similar to UTF-32, capable of encoding ~~2^63~~ 2^31 characters (sign bit is fixed to 0), but its definition is artificially limited to the range 0..0x10FFFF to stay compatible with other Unicode encodings. Because of this limitiation, UCS-4 and UTF-32 encode all characters in an identical way. See also Universal Character Set, "Unicode Encodings" and "Beyond Unicode code points" in perlunicode. More "Unicde and Perl" stuff: perlunicode, perlunicook,perlunifaq, perluniintro, perlunitut, Encode Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re^2: Arabic to Hex and Hex to Arabic by thanos1983 (Parson) on Jul 29, 2017 at 16:02 UTC
Hello kcott, Thanks this is one of the reasons that I ask questions on this forum and not on any other. People are coming up with so many interesting answers and new ideas. Thank you for your time and effort. :D Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l] [select]