in reply to Re^5: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
in thread Windows-1252 characters from \x{0080} thru \x{009f}

You are right, I didn't consider how indexing into a buffer works which contains multi-byte characters. There is an ugly solution for that, which would be a new type of scalar that stores two numbers, one for the byte index and one for the codepoint index. But let's not go there.

Now I'm even more at a loss on how to make p5's Unicode handling more robust. Maybe a three-way flag (byte/codepoint/unknown) could be introduced, and operations on incompatible types could then at least warn (probably with a warning not enabled by default), but not coerce. And it would provide at least some measure of introspection.

  • Comment on Re^6: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)

Replies are listed 'Best First'.
Re^7: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
by BrowserUk (Patriarch) on Apr 19, 2012 at 22:33 UTC
    Now I'm even more at a loss on how to make p5's Unicode handling more robust.

    There is an efficient, workable solution to this problem.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?