Re^2: 'use' inside or outside of package declaration?

I'm putting

use 5.10.1;
use utf8;
[download]

as the very first things. I'm not afraid of very old perls not liking the dotted syntax since the installer already checked the version once. I agree, this documents the version I wrote it under and tested it under, and can ensure some backward compatibility if even newer ones changed things.

The meaning of utf8 in modern versions is simply to state that the source file is UTF-8. That really really affects the entire file, not a block, not a package.

Based on what I learned on this thread, I'll put other stuff after the module's package line. I'm trying autodie for example, and I had supposed that to be more global and not per-package, but it was on your list.

As for the referenced ~~rant~~editorial concerning warnings in production servers, I think that ought to be fixed once and for all in the logging system. Just piping stderr to a file and archiving that file is rather brute simple. I've worked on systems where logging was more engineered. At the very least it has quotas with purging so it won't fill up the disk!

Comment on Re^2: 'use' inside or outside of package declaration? Select or Download Code

Replies are listed 'Best First'.
Re^3: 'use' inside or outside of package declaration? by tchrist (Pilgrim) on May 12, 2011 at 11:56 UTC
My current semi-working boilerplate for new programs tends to look like this: #!/usr/bin/env perl use 5.012; # want unicode strings! use utf8; use strict; use autodie; use warnings; # defer FATAL till runtime use open qw< :std :utf8 >; use charnames qw< :full >; use File::Basename qw< basename >; use Carp qw< carp croak confess cluck >; $0 = basename($0); # shorter messages $\| = 1; binmode(DATA, ":utf8"); # give a full stack dump on any untrapped exceptions $SIG{__DIE__} = sub { confess "Uncaught exception: $@" unless $^S; }; # now promote run-time warnings into stackdumped exceptions # unless we're in an try block, in which # case just generate a clucking stackdump instead $SIG{__WARN__} = sub { if ($^S) { cluck "Trapped warning: @_" } else { confess "Deadly warning: @_" } }; [download] But that suffers form a couple of bugs. There is a bug in the implementation of `autodie` that screws up the layers imposed by `use open`. Witness: `% perl -e 'use open qw(:std :utf8); open(F, ">/tmp/out"); print F "\xD +F"'; wc /tmp/out 0 1 2 /tmp/out % perl -e 'use autodie; use open qw(:std :utf8); open(F, ">/tmp/out"); + print F "\xDF"' ; wc /tmp/out 0 1 1 /tmp/out % perl -e ' use open qw(:std :utf8); use autodie; open(F, ">/tmp/out") +; print F "\xDF"' ; wc /tmp/out 0 1 1 /tmp/out` [download] The other problem is that `use utf8` doesn’t really work well on globals, because there are issues with how the package symbol tables are accessed as byte strings. There is also an issue of what to do about something like: use Weather::El_Niño; That has to map to the filesystem, and now what do you do? Just use the bytes as they are? Normalize to UTF‑8? Downgrade to Latin1 (which it might have already been)? Did you know that (for very good reasons) the Darwin HSF+ filesystem always converts filenames into NFD, their canonically decomposed form? So be careful when checking filenames!! You can’t just say: @files = grep { /Niñ/ } glob("{El,La}_*"); Because if you input it in the normal way, your pattern is going to have a U+00F1 `LATIN SMALL LETTER N WITH TILDE` there (which is the NFC form) but the results from the filesystem will have an "n" followed by U+0303 `COMBINING TILDE` (the NFD version), which is suddenly two separate code points, not one. There is a Google Summer of Code project for cleaning up Perl’s tokenizer vis‐à‐vis 8‑bit names, including for UTF‑8. I am convinced that this can and shall be fixed. I see my stalker is back. Yawn!	[reply] [d/l] [select]
Re^4: 'use' inside or outside of package declaration? by choroba (Cardinal) on May 12, 2011 at 12:07 UTC
Just a note: Darwin HSF+ uses NFD (with some deviations), not NFC. I've been bitten by it, you cannot even do `touch á cat á` [download] in the shell.	[reply] [d/l]
Re^5: 'use' inside or outside of package declaration? by tchrist (Pilgrim) on May 12, 2011 at 12:16 UTC
Yes, sorry, I wrote NFC but described NFD. I’ll fix it. However, your example of something that doesn’t work, does. Watch: `% uniquote -v t echo foo > \N{LATIN SMALL LETTER A WITH ACUTE} cat \N{LATIN SMALL LETTER A WITH ACUTE} % sh /tmp/t foo % ls a? \| uniquote -v a\N{COMBINING ACUTE ACCENT}` [download] It has to work that way, because the same NFC conversion takes place for all filenames passed to `open`. But I do know what you mean. It depends on the syscall. Apparently `stat` doesn’t do that, since there you have to do it yourself: `% perl -le 'print -e "\xE1" ? "Yes" : "No"' No % perl -MUnicode::Normalize -le 'print -e NFC("\xE1") ? "Yes" : "No"' Yes` [download] I see my stalker is back. Yawn!	[reply] [d/l] [select]
Re^6: 'use' inside or outside of package declaration? by choroba (Cardinal) on May 12, 2011 at 12:53 UTC
Re^4: 'use' inside or outside of package declaration? by John M. Dlugosz (Monsignor) on May 12, 2011 at 12:34 UTC
I'll study that some more later in the day. `use strict` is redundant since 5.12 includes that. As for file names, don't forget Windows uses UTF-16 in its API. The characters has well-defined meanings (not just bytes), but it still suffers from Normalization issues. `binmode(DATA, ":utf8");` That's not implied by the utf8 pragma?	[reply] [d/l] [select]
Re^5: 'use' inside or outside of package declaration? by tchrist (Pilgrim) on May 12, 2011 at 12:45 UTC
I put the `strict` there even when using 5.12 or above because it documents what’s going on, and because I alas find myself downgrading to 5.10.1 now and again, and don’t want to lose things. In general, I’m for several reasons opposed to non‐pragmas diddling their caller’s scope’s lexical hints in non‐obvious ways unrelated to that module’s purpose; that’s just too `Acme::` for my tastes. But I do consider `use 5.012` a pragma — that is, a compiler declaration that can alter the rules of engagement. And no, the necessary `binmod`ing of `DATA` is triggered neither by `use utf8` nor by `use open ":utf8"`. Go figger. 😾 I have as little to do with UTF‑16 as I possibly can. 🙈 🙉 🙊 Anything that makes me deal with individual code units is such a lose that I just want to kick the people who afflicted the world with this idiocy. Aren’t you glad we don’t have to count code units in Perl? 😹 I see my stalker is back. Yawn!	[reply] [d/l] [select]
Re^6: 'use' inside or outside of package declaration? by John M. Dlugosz (Monsignor) on May 12, 2011 at 12:57 UTC
Re^7: 'use' inside or outside of package declaration? by tchrist (Pilgrim) on May 12, 2011 at 14:37 UTC
Some notes below your chosen depth have not been shown here