in reply to Re: 'use' inside or outside of package declaration?
in thread 'use' inside or outside of package declaration?

I'm putting
use 5.10.1; use utf8;
as the very first things. I'm not afraid of very old perls not liking the dotted syntax since the installer already checked the version once. I agree, this documents the version I wrote it under and tested it under, and can ensure some backward compatibility if even newer ones changed things.

The meaning of utf8 in modern versions is simply to state that the source file is UTF-8. That really really affects the entire file, not a block, not a package.

Based on what I learned on this thread, I'll put other stuff after the module's package line. I'm trying autodie for example, and I had supposed that to be more global and not per-package, but it was on your list.

As for the referenced ranteditorial concerning warnings in production servers, I think that ought to be fixed once and for all in the logging system. Just piping stderr to a file and archiving that file is rather brute simple. I've worked on systems where logging was more engineered. At the very least it has quotas with purging so it won't fill up the disk!

Replies are listed 'Best First'.
Re^3: 'use' inside or outside of package declaration?
by tchrist (Pilgrim) on May 12, 2011 at 11:56 UTC
    My current semi-working boilerplate for new programs tends to look like this:
    #!/usr/bin/env perl use 5.012; # want unicode strings! use utf8; use strict; use autodie; use warnings; # defer FATAL till runtime use open qw< :std :utf8 >; use charnames qw< :full >; use File::Basename qw< basename >; use Carp qw< carp croak confess cluck >; $0 = basename($0); # shorter messages $| = 1; binmode(DATA, ":utf8"); # give a full stack dump on any untrapped exceptions $SIG{__DIE__} = sub { confess "Uncaught exception: $@" unless $^S; }; # now promote run-time warnings into stackdumped exceptions # *unless* we're in an try block, in which # case just generate a clucking stackdump instead $SIG{__WARN__} = sub { if ($^S) { cluck "Trapped warning: @_" } else { confess "Deadly warning: @_" } };
    But that suffers form a couple of bugs. There is a bug in the implementation of autodie that screws up the layers imposed by use open. Witness:
    % perl -e 'use open qw(:std :utf8); open(F, ">/tmp/out"); print F "\xD +F"'; wc /tmp/out 0 1 2 /tmp/out % perl -e 'use autodie; use open qw(:std :utf8); open(F, ">/tmp/out"); + print F "\xDF"' ; wc /tmp/out 0 1 1 /tmp/out % perl -e ' use open qw(:std :utf8); use autodie; open(F, ">/tmp/out") +; print F "\xDF"' ; wc /tmp/out 0 1 1 /tmp/out
    The other problem is that use utf8 doesn’t really work well on globals, because there are issues with how the package symbol tables are accessed as byte strings. There is also an issue of what to do about something like:
    use Weather::El_Niño;
    That has to map to the filesystem, and now what do you do? Just use the bytes as they are? Normalize to UTF‑8? Downgrade to Latin1 (which it might have already been)? Did you know that (for very good reasons) the Darwin HSF+ filesystem always converts filenames into NFD, their canonically decomposed form? So be careful when checking filenames!! You can’t just say:
    @files = grep { /Niñ/ } glob("{El,La}_*");
    
    Because if you input it in the normal way, your pattern is going to have a U+00F1 LATIN SMALL LETTER N WITH TILDE there (which is the NFC form) but the results from the filesystem will have an "n" followed by U+0303 COMBINING TILDE (the NFD version), which is suddenly two separate code points, not one.

    There is a Google Summer of Code project for cleaning up Perl’s tokenizer vis‐à‐vis 8‑bit names, including for UTF‑8. I am convinced that this can and shall be fixed.


    I see my stalker is back. Yawn!

      Just a note: Darwin HSF+ uses NFD (with some deviations), not NFC. I've been bitten by it, you cannot even do
      touch á cat á
      in the shell.
        Yes, sorry, I wrote NFC but described NFD. I’ll fix it. However, your example of something that doesn’t work, does. Watch:
        % uniquote -v t echo foo > \N{LATIN SMALL LETTER A WITH ACUTE} cat \N{LATIN SMALL LETTER A WITH ACUTE} % sh /tmp/t foo % ls a? | uniquote -v a\N{COMBINING ACUTE ACCENT}
        It has to work that way, because the same NFC conversion takes place for all filenames passed to open. But I do know what you mean. It depends on the syscall. Apparently stat doesn’t do that, since there you have to do it yourself:
        % perl -le 'print -e "\xE1" ? "Yes" : "No"' No % perl -MUnicode::Normalize -le 'print -e NFC("\xE1") ? "Yes" : "No"' Yes


        I see my stalker is back. Yawn!

      I'll study that some more later in the day.

      use strict is redundant since 5.12 includes that.

      As for file names, don't forget Windows uses UTF-16 in its API. The characters has well-defined meanings (not just bytes), but it still suffers from Normalization issues.

      binmode(DATA, ":utf8"); That's not implied by the utf8 pragma?

        I put the strict there even when using 5.12 or above because it documents what’s going on, and because I alas find myself downgrading to 5.10.1 now and again, and don’t want to lose things.

        In general, I’m for several reasons opposed to non‐pragmas diddling their caller’s scope’s lexical hints in non‐obvious ways unrelated to that module’s purpose; that’s just too Acme:: for my tastes. But I do consider use 5.012 a pragma — that is, a compiler declaration that can alter the rules of engagement.

        And no, the necessary binmoding of DATA is triggered neither by use utf8 nor by use open ":utf8". Go figger. 😾

        I have as little to do with UTF‑16 as I possibly can. 🙈 🙉 🙊 Anything that makes me deal with individual code units is such a lose that I just want to kick the people who afflicted the world with this idiocy. Aren’t you glad we don’t have to count code units in Perl? 😹


        I see my stalker is back. Yawn!