two byte chars in code

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am making a program that has to do some substitutions on Japanese two-byte chars. Basically, I need to substitute Japanese EUC spaces for ASCII spaces, and a few Japanese words for their English equiv.

Actually doing something like

s!JPCHARS!ascii chars!;
[download]

Works fine, but I don't want to put the Japanese chars in my script because some of the other people who will be working on it have dumb text-editors that can't open and save it correctly.

So, for example, if the Japnese space is 0x80 or something like that, I want to make my code something like

s!0x80! !;
[download]

I can't figure out how to find the hex code for a Japanese char though. I am trying to use hex or ord or pack or unpack, but I cant seem to get it to work. (probably because I don't even know which one I should be using)

For example, These are some little snippets I have found around different places I searched for an answer:

my $str1 = qq(&#12354;);
my $str2 = qq(&#12360;);

print oct($str1); # gives me 0
print oct($str2); # gives me 0
print pack "CC", $str1; #gives me nothing
print pack "CC", $str2; #gives me nothing
print unpack "H2", $str1; #gives me e3
print unpack "H2", $str2; #gives me e3
printf("0x%02x\n", $str1); #gives me 0x00
printf("0x%02x\n", $str2); #gives me 0x00
[download]

The problem is that different Japanese chars are giving me the same value in the output, so I am obviously not using these correctly, but I can't find anything that helps me with regard to double bytes.

I realize I am way offtrack, so if anyone can point me in the right direction I would be most grateful!

Thank you!

Comment on two byte chars in code Select or Download Code

Replies are listed 'Best First'.

Re: two byte chars in code
by dave_the_m (Monsignor) on May 28, 2004 at 10:32 UTC

s/\x{12ab}/ /g
[download]

Dave.

[reply]
[d/l]

Re: two byte chars in code
by PodMaster (Abbot) on May 28, 2004 at 10:31 UTC

perldoc

ord

    ord EXPR
    ord     Returns the numeric (ASCII or Unicode) value of the first
            character of EXPR. If EXPR is omitted, uses "$_". For the
            reverse, see the chr entry elsewhere in this document. See the
            utf8 manpage for more about Unicode.

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

[reply]