in reply to Re^4: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
in thread Windows-1252 characters from \x{0080} thru \x{009f}

Yes, it is equivalent, but that doesn't create the existence of iso-8859-1 as a default. Default indicates a choice, something that can be changed. This is a side-effect of a bug in the user's code, not a default.

It would require Perl to keep track of what is a byte and what is a codepoint

Even if you added a new type of data, I don't see how that helps. How can "É" match a byte? (Upd: Well, I suppose you could add a pragma to specify the encoding to use when Perl needs text from bytes, but wouldn't that break @- and pos? So how would /g work? What about captures? They currently only capture from the supplied string, but that would have to be changed. Unless you're suggesting that the data in scalar actually changes when the decoding happens? Yeah, I've been working on this. )

(And it should probably be "byte, decoded text or unknown", if only for backwards compatibility.)

Replies are listed 'Best First'.
Re^6: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
by moritz (Cardinal) on Apr 19, 2012 at 19:35 UTC

    You are right, I didn't consider how indexing into a buffer works which contains multi-byte characters. There is an ugly solution for that, which would be a new type of scalar that stores two numbers, one for the byte index and one for the codepoint index. But let's not go there.

    Now I'm even more at a loss on how to make p5's Unicode handling more robust. Maybe a three-way flag (byte/codepoint/unknown) could be introduced, and operations on incompatible types could then at least warn (probably with a warning not enabled by default), but not coerce. And it would provide at least some measure of introspection.

      Now I'm even more at a loss on how to make p5's Unicode handling more robust.

      There is an efficient, workable solution to this problem.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?