in reply to Behaviour of Encode::decode_utf8 on ASCII

5.8.0 and 5.8.8 both return the same result for me.

is tagged as a unicode string
This is perl, v5.8.0 built for MSWin32-x86-multi-thread Binary build 806 provided by ActiveState Corp. Built 00:45:44 Mar 31 2003 Encode 1.83
This is perl, v5.8.8 built for MSWin32-x86-multi-thread Binary build 817 [257965] provided by ActiveState Built Mar 20 2006 17:54:25 Encode 2.12

It would make no sense for it not to be tagged. When one asks to decode a string of bytes (UTF8 off) to a string of chars (UTF8 on), it makes no sense that the same call sometimes returns a string of chars and sometimes get a string of bytes.

I'd say the bug is in the docs.

Replies are listed 'Best First'.
Re^2: Behaviour of Encode::decode_utf8 on ASCII
by jbert (Priest) on Feb 14, 2007 at 20:02 UTC
    Except there is the issue of efficiency (see my other post above). Representing a string of characters which all happen to lie within the ASCII range as an untagged byte string allows the byte-oriented regex engine to be used.

    It's a very similar idea to using machine words to hold integers up to a certain value, and then switching to a different representation for bignums. It doesn't make a difference to correctness, but it does make a difference to performance.

      To repeat what i said elsewhere, IMO this is a bug that should be reported.

      Id do it for you, but using perlbug on win32 is a real PITA.

      ---
      $world=~s/war/peace/g