You are right, I didn't consider how indexing into a buffer works which contains multi-byte characters. There is an ugly solution for that, which would be a new type of scalar that stores two numbers, one for the byte index and one for the codepoint index. But let's not go there.
Now I'm even more at a loss on how to make p5's Unicode handling more robust. Maybe a three-way flag (byte/codepoint/unknown) could be introduced, and operations on incompatible types could then at least warn (probably with a warning not enabled by default), but not coerce. And it would provide at least some measure of introspection.
In reply to Re^6: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
by moritz
in thread Windows-1252 characters from \x{0080} thru \x{009f}
by Jim
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |