in reply to Re^5: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
in thread Windows-1252 characters from \x{0080} thru \x{009f}
You are right, I didn't consider how indexing into a buffer works which contains multi-byte characters. There is an ugly solution for that, which would be a new type of scalar that stores two numbers, one for the byte index and one for the codepoint index. But let's not go there.
Now I'm even more at a loss on how to make p5's Unicode handling more robust. Maybe a three-way flag (byte/codepoint/unknown) could be introduced, and operations on incompatible types could then at least warn (probably with a warning not enabled by default), but not coerce. And it would provide at least some measure of introspection.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^7: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
by BrowserUk (Patriarch) on Apr 19, 2012 at 22:33 UTC |