in reply to Setting UTF-8 mode on filehandle reads?

Perl's current strategy on utf8, is to make it work with the least modification. In perlunicode, it is clearly stated that, Perl does not cover unicode standards from cover to cover. Also in another perl doc, I forgot which one, it is said that, Perl will remain like this until unicode is inescapable. This is something we have to be aware of all the time.

Perl's way of handling unicode I/O is a good evidence of this strategy. A layer called ':utf8' was inserted between your program and your descriptors. I would expect this to be totally revised, when Perl 6 comes out, otherwise I feel some real worry here.

To make this ':utf8' layer come to work, you have to explicitly add it. Yet this is something separate from the 'utf8' pragmas. That pragmas simply does not affect your I/O at all.

For ':utf' layer itself:
Examples
at openopen(FILEHANDLE, "<:utf8", "abc.utf8.txt")
after openbinmode(STDOUT, ":utf8")
  • Comment on Re: Setting UTF-8 mode on filehandle reads?

Replies are listed 'Best First'.
Re: Re: Setting UTF-8 mode on filehandle reads?
by grantm (Parson) on Dec 07, 2002 at 01:47 UTC
    I would expect this to be totally revised, when Perl 6 comes out, otherwise I feel some real worry here.

    I'm not quite sure what you're getting at here. You will always need to tell Perl that you want it to use UTF-8 encoding when you read a specific file. Sure in the future some of the region-specific encodings such as Latin-1 might lose popularity to Unicode. But if Perl assumed every file was a UTF-8 character stream then Perl would no longer be able to read binary byte streams (or even UTF-16 encoded).

    The XML spec provides a way for a program to unambiguously determine the encoding of an XML document. In the absense of this type of in-band information in other text file formats, you will need to specify an encoding.

    As you point out, 5.8 provides the very powerful IO layer model for dealing with this and other problems. I don't expect IO layers to disappear in 6.0. And for people stuck with 5.6, pack hack's do provide a workaround.

    What is expected to change in the future is that Perl will assume your script itself is UTF-8 encoded. Assuming you use a UTF-8 aware editor, that will allow you to include non-ASCII characters in string literals simply by typing them. At the moment if you want to do that you have to say 'use utf8' in the future that will be assumed (and to quote the docs, "'use utf8' will become a noop").

      "You will always need to tell Perl that you want it to use UTF-8 encoding" Unless that's the default. Of course, if you want to read a binary file, you can always ask for that, but there's no reason in this day and age that UTF8 encoding wouldn't be the default.