perl handling of utf8

evilgoblin has asked for the wisdom of the Perl Monks concerning the following question:

I had a couple of questions: How do I single out a single character with Unicode code point for any operation (say replacement or removal), in the regex do I use \x or \X ? what is the difference between the 2?
Also I had another question on the "eq" operator. Say if $var1 is a byte sequence with the internal UTF-8 flag on, and $var2 is the exact same byte sequence with the UTF8 flag off, what would be the return value on "$var1 eq $var2"? I tested this by reading in a string and doing Encode::_utf8_on($string) on it and then comparing the two. The return value is true, but could some1 explain the behaviour? I would think that one variable having the flag on and the other off would return a FALSE value regardless of the byte sequence therein. Thanks

Considered: astaines: Re-title 'perl UTF-8 questions'?
Unconsidered: g0n - enough keep votes (Keep: 17, Edit: 7, Reap: 0)

Comment on perl handling of utf8

Replies are listed 'Best First'.
Re: perl handling of utf8 by Zaxo (Archbishop) on May 02, 2006 at 03:44 UTC
I give an example of replacing unprintable ASCII characters with utf-8 in Printing the Unprintable. After Compline, Zaxo	[reply]
Re: perl handling of utf8 by Anonymous Monk on May 02, 2006 at 00:38 UTC
in the regex do I use \x or \X ? what is the difference between the 2? rtfm perlre	[reply]