in reply to Re^4: substr function
in thread substr function

Huh?

If <194> is a character entity that represents a "non-ASCII character," then this is precisely something that makes my question germane to the problem of measuring the length of the text in which the character entity occurs.

For Unicode text, there are at least three valid, meaningful ways to measure the size of the text: in bytes, in characters (code points), and in grapheme clusters.

Replies are listed 'Best First'.
Re^6: substr function
by ikegami (Patriarch) on Jan 13, 2011 at 17:30 UTC

    Oops, it's makes bytes vs encoded characters moot, but characters vs graphemes is still relevant.

    By the way, there is indeed a fourth: Some characters are double-wide, so you could also talk about visual width.