in reply to Re^3: How does the built-in function length work?
in thread How does the built-in function length work?
perl has to assume a character encoding.
Not at all. If it must assume an encoding, and that encoding is iso-8859-1 for
"\x{E4}" =~ /\w/
then what encoding is assumed for the following?
"\x{2660}" =~ /\w/
It never deals with any encoding. It always deals with string elements (characters). And those string elements (characters) are assumedrequired to be Unicode code points.
It's entirely up to you to create a string with the right elements, which may or may not involve character encodings.
Or what do you think it is, if not ISO-8859-1?
A Unicode code point, regardless of the state of the UTF8 flag.
In short, you're over complicating things. It's NOT:
Each character is expected to be an iso-8859-1 byte if UTF8=0 or a Unicode code point if UTF8=1.
It's simply:
Each character is expected to be a Unicode code point.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: How does the built-in function length work?
by moritz (Cardinal) on Dec 03, 2011 at 06:22 UTC | |
by ikegami (Patriarch) on Dec 03, 2011 at 07:53 UTC |