comment on

My current semi-working boilerplate for new programs tends to look like this:

#!/usr/bin/env perl

use 5.012;  # want unicode strings!
use utf8;
use strict;
use autodie;
use warnings;  # defer FATAL till runtime 
use open        qw< :std  :utf8   >;
use charnames   qw< :full >;

use File::Basename      qw< basename >;
use Carp                qw< carp croak confess cluck >;

$0 = basename($0);  # shorter messages
$| = 1; 
binmode(DATA, ":utf8");

# give a full stack dump on any untrapped exceptions
$SIG{__DIE__} = sub {
    confess "Uncaught exception: $@" unless $^S;
};

# now promote run-time warnings into stackdumped exceptions
#   *unless* we're in an try block, in which 
#   case just generate a clucking stackdump instead
$SIG{__WARN__} = sub {
    if ($^S) { cluck   "Trapped warning: @_" } 
    else     { confess "Deadly warning: @_"  }
};
[download]

But that suffers form a couple of bugs. There is a bug in the implementation of autodie that screws up the layers imposed by use open. Witness:

% perl -e 'use open qw(:std :utf8); open(F, ">/tmp/out"); print F "\xD
+F"'; wc /tmp/out
       0       1       2 /tmp/out
% perl -e 'use autodie; use open qw(:std :utf8); open(F, ">/tmp/out");
+ print F "\xDF"' ; wc /tmp/out
       0       1       1 /tmp/out
% perl -e ' use open qw(:std :utf8); use autodie; open(F, ">/tmp/out")
+; print F "\xDF"' ; wc /tmp/out
       0       1       1 /tmp/out
[download]

The other problem is that use utf8 doesn’t really work well on globals, because there are issues with how the package symbol tables are accessed as byte strings. There is also an issue of what to do about something like:

use Weather::El_Niño;

That has to map to the filesystem, and now what do you do? Just use the bytes as they are? Normalize to UTF‑8? Downgrade to Latin1 (which it might have already been)? Did you know that (for very good reasons) the Darwin HSF+ filesystem always converts filenames into NFD, their canonically decomposed form? So be careful when checking filenames!! You can’t just say:

@files = grep { /Niñ/ } glob("{El,La}_*");

Because if you input it in the normal way, your pattern is going to have a U+00F1 LATIN SMALL LETTER N WITH TILDE there (which is the NFC form) but the results from the filesystem will have an "n" followed by U+0303 COMBINING TILDE (the NFD version), which is suddenly two separate code points, not one.

There is a Google Summer of Code project for cleaning up Perl’s tokenizer vis‐à‐vis 8‑bit names, including for UTF‑8. I am convinced that this can and shall be fixed.

I see my stalker is back. Yawn!

In reply to Re^3: 'use' inside or outside of package declaration? by tchrist
in thread 'use' inside or outside of package declaration? by John M. Dlugosz

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.