Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am making a program that has to do some substitutions on Japanese two-byte chars. Basically, I need to substitute Japanese EUC spaces for ASCII spaces, and a few Japanese words for their English equiv.

Actually doing something like

s!JPCHARS!ascii chars!;

Works fine, but I don't want to put the Japanese chars in my script because some of the other people who will be working on it have dumb text-editors that can't open and save it correctly.

So, for example, if the Japnese space is 0x80 or something like that, I want to make my code something like
s!0x80! !;

I can't figure out how to find the hex code for a Japanese char though. I am trying to use hex or ord or pack or unpack, but I cant seem to get it to work. (probably because I don't even know which one I should be using)

For example, These are some little snippets I have found around different places I searched for an answer:
my $str1 = qq(あ); my $str2 = qq(え); print oct($str1); # gives me 0 print oct($str2); # gives me 0 print pack "CC", $str1; #gives me nothing print pack "CC", $str2; #gives me nothing print unpack "H2", $str1; #gives me e3 print unpack "H2", $str2; #gives me e3 printf("0x%02x\n", $str1); #gives me 0x00 printf("0x%02x\n", $str2); #gives me 0x00

The problem is that different Japanese chars are giving me the same value in the output, so I am obviously not using these correctly, but I can't find anything that helps me with regard to double bytes.

I realize I am way offtrack, so if anyone can point me in the right direction I would be most grateful!

Thank you!

Replies are listed 'Best First'.
Re: two byte chars in code
by dave_the_m (Monsignor) on May 28, 2004 at 10:32 UTC
    well, assuming your string is already utf8 encoded, then something like
    s/\x{12ab}/ /g
    should do the trick

    Dave.

Re: two byte chars in code
by PodMaster (Abbot) on May 28, 2004 at 10:31 UTC
    Are you sure you're using ord? Looks to me like you're using oct :)
    perldoc -f ord
        ord EXPR
        ord     Returns the numeric (ASCII or Unicode) value of the first
                character of EXPR. If EXPR is omitted, uses "$_". For the
                reverse, see the chr entry elsewhere in this document. See the
                utf8 manpage for more about Unicode.
    
    

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.