But that suffers form a couple of bugs. There is a bug in the implementation of autodie that screws up the layers imposed by use open. Witness:#!/usr/bin/env perl use 5.012; # want unicode strings! use utf8; use strict; use autodie; use warnings; # defer FATAL till runtime use open qw< :std :utf8 >; use charnames qw< :full >; use File::Basename qw< basename >; use Carp qw< carp croak confess cluck >; $0 = basename($0); # shorter messages $| = 1; binmode(DATA, ":utf8"); # give a full stack dump on any untrapped exceptions $SIG{__DIE__} = sub { confess "Uncaught exception: $@" unless $^S; }; # now promote run-time warnings into stackdumped exceptions # *unless* we're in an try block, in which # case just generate a clucking stackdump instead $SIG{__WARN__} = sub { if ($^S) { cluck "Trapped warning: @_" } else { confess "Deadly warning: @_" } };
The other problem is that use utf8 doesn’t really work well on globals, because there are issues with how the package symbol tables are accessed as byte strings. There is also an issue of what to do about something like:% perl -e 'use open qw(:std :utf8); open(F, ">/tmp/out"); print F "\xD +F"'; wc /tmp/out 0 1 2 /tmp/out % perl -e 'use autodie; use open qw(:std :utf8); open(F, ">/tmp/out"); + print F "\xDF"' ; wc /tmp/out 0 1 1 /tmp/out % perl -e ' use open qw(:std :utf8); use autodie; open(F, ">/tmp/out") +; print F "\xDF"' ; wc /tmp/out 0 1 1 /tmp/out
use Weather::El_Niño;That has to map to the filesystem, and now what do you do? Just use the bytes as they are? Normalize to UTF‑8? Downgrade to Latin1 (which it might have already been)? Did you know that (for very good reasons) the Darwin HSF+ filesystem always converts filenames into NFD, their canonically decomposed form? So be careful when checking filenames!! You can’t just say:
@files = grep { /Niñ/ } glob("{El,La}_*");
Because if you input it in the normal way, your pattern is going to have a U+00F1 LATIN SMALL LETTER N WITH TILDE there (which is the NFC form) but the results from the filesystem will have an "n" followed by U+0303 COMBINING TILDE (the NFD version), which is suddenly two separate code points, not one.
There is a Google Summer of Code project for cleaning up Perl’s tokenizer vis‐à‐vis 8‑bit names, including for UTF‑8. I am convinced that this can and shall be fixed.
I see my stalker is back. Yawn!
In reply to Re^3: 'use' inside or outside of package declaration?
by tchrist
in thread 'use' inside or outside of package declaration?
by John M. Dlugosz
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |