in reply to Interventionist Unicode Behaviors

Another "feature" that's bitten me in the butt is the silent "upgrading" (i.e. corruption) of non-UTF8 scalars when concatenated with UTF8 scalars...
Then you want encoding::warnings.
Think there's any chance these behaviors could change in Perl 5.10? Is it worth bringing up on p5p?
Sure, but keep in mind that a lot of very smart people have put a lot of thought into the existing behavior, and where there are flaws or caveats, alternative behavior was judged to be worse. If you want to propose a change, make sure you think through the possible drawbacks.

It sounds like you want the utf8 flag on handles to go away and just have the output encoding depend on perl's internal encoding of the data - this sounds very bad in a number of ways.

Replies are listed 'Best First'.
Re^2: Interventionist Unicode Behaviors
by creamygoodness (Curate) on Sep 08, 2006 at 08:07 UTC
    Then you want encoding::warnings.

    I can't foist that on people who use my CPAN library. The problem isn't me -- it's the userbase. Expertise varies. Some are extremely sophisticated. Most aren't.

    I do like that module, nevertheless. I think its behavior should be rolled into the core warnings pragma. It's probably too late for that now, though.

    It sounds like you want the utf8 flag on handles to go away and just have the output encoding depend on perl's internal encoding of the data

    No, that's not an accurate characterization. I would like filehandles -- particularly STDOUT -- to be encoding-agnostic by default. However, it should be possible to turn on encoding enforcement using the current mechanism.

    --
    Marvin Humphrey
    Rectangular Research ― http://www.rectangular.com
      It sounds like you want the utf8 flag on handles to go away and just have the output encoding depend on perl's internal encoding of the data
      No, that's not an accurate characterization. I would like filehandles -- particularly STDOUT -- to be encoding-agnostic by default. However, it should be possible to turn on encoding enforcement using the current mechanism.
      It sounded like you meant
      perl -we'$_ = "\xb1"; print; utf8::upgrade($_); print'
      should output three bytes, not two (if STDOUT is not utf8) or four (if STDOUT is utf8, e.g. with -CO). If that's not your "encoding-agnostic" (that I call "the output depending on perl's internal encoding of the data"), I'm not sure what you mean by "encoding-agnostic".