Come for the quick hacks, stay for the epiphanies. | |
PerlMonks |
Re: How to reverse a (Unicode) stringby Juerd (Abbot) |
on Jan 09, 2008 at 22:03 UTC ( [id://661519]=note: print w/replies, xml ) | Need Help?? |
If you entered this using an UTF-8 editor, you forgot to "use utf8;" to notify Perl of this fact. You may be dealing with the string "\no\x{C3}\x{A4}u" instead of the intended "\no\x{e4}u"!
reverse works on characters. If you have a bytestring, every character represents the equivalent byte. If you have a Unicode text string, reverse properly reverses based on unicode codepoints.
This suggests that decoding is a workaround. It is not, it is something you should always do when dealing with text data!
Perl has no idea, and cannot be told, what kind your strings are: binary or text. Without "use utf8" you don't necessarily have byte strings, but if you have text strings, they're interpreted as iso-8859-1 rather than utf-8. Note that iso-8859-1 is a unicode encoding -- it just doesn't support all of the characters. The rest of your post is accurate, but I wanted to respond to avoid that newbies get a negative feeling about Perl's unicode support from your post. Perl's unicode support is great, but the programmer MUST learn the difference between unicode and utf-8, and the difference between text data and binary data.
In Section
Tutorials
|
|