in reply to Re: Playing with "funny" chars
in thread Playing with extended chars

(oops, I wanted to reply to the first post but clicked here by accident ;) ).

My recommendation is to use perl 5.8.0 or more recent and look at perldoc Encode, perldoc open, and perldoc -f open. If tr doesn't work because you have the characters encoded in two bytes, you can do

$s = decode_utf8($s);

That will convert the string into the internal representation where characters are characters and you don't have to worry about how many bytes they need for encoding.

Replies are listed 'Best First'.
Re^3: Playing with "funny" chars
by deibyz (Hermit) on Sep 27, 2004 at 13:31 UTC
    I think the problem is not on the string (I'm using perl5.8.5, because 5.8.0 had some bugs in RedHat), but on the tr operator itself.

    The first attemp works like this:

    perl -e '$_="áéíóú";tr/áéíóú/aeiou/;print' aeaoauauau
    It seems that "á" is treated as two characters, maybe "´" and "a", and each one get one different matching char ( "a" and "e").

    BTW, encode and decode functions return values that make me think that the string is well formed, and that is tr// who's making wrong things. Am I too lost?

      If you have utf8 encoded strings in your program file, you need to use the utf8 pragma (see perldoc utf8).

      use utf8; $s = 'holáéíóúon'; $s =~ tr/áéíóú/aeiou/; print $s; # prints holaeiouon
      The code above may show the double characters explicitly since perlmonks.org is served as ISO-8859-1.