Thank you, that's very useful as well. In what sense is using length to count Unicode characters a bug waiting to happen, though? Now I'll admit I've just learned first hand that this is indeed dangerous territory to tread, but the perldoc entry for length (which I checked beforehand to make sure it wouldn't count bytes -- hence my confusion) says:
Like all Perl character operations, length() normally deals in logical characters, not physical bytes. For how many bytes a string encoded as UTF-8 would take up, use length(Encode::encode_utf8(EXPR)) (you'll have to use Encode first).
So if used right, it should work, shouldn't it? Do you have any specific languages or complex data in mind with which it might fail?
In reply to Re^2: length() miscounting UTF8 characters?
by AppleFritter
in thread length() miscounting UTF8 characters?
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |