http://qs1969.pair.com?node_id=11117703


in reply to Re^10: Summing numbers in a file
in thread Summing numbers in a file

given a CODE ref in a structure somewhere, how do you get the closed over values out of it?

Ok, I see what you're asking now. Whenever you want to inspect lexical variables for debugging, think PadWalker (that's also what the Perl debugger uses, as per its docs). If you wanted to get fancy, combine it with Data::Dump::Filtered and B::Deparse:

use warnings; use strict; use B::Deparse; use Data::Dump qw/dd pp/; use Data::Dump::Filtered qw/add_dump_filter/; use Sub::Util qw/subname/; use PadWalker qw/closed_over/; my $deparse = B::Deparse->new(); $deparse->ambient_pragmas(strict=>'all', warnings=>'all'); add_dump_filter(sub { my ($ctx, $obj) = @_; if ( $ctx->is_code && subname($obj)=~/\b__ANON__\z/ ) { my $vars = closed_over($obj); return { dump => 'sub '.$deparse->coderef2text($obj), comment => join("\n", map { my $v = $vars->{$_}; "my $_ = ".( ref $v eq 'ARRAY' ? pp(@$v) : ref $v eq 'HASH' ? "(".join(", ", map { pp($_).' => ' .pp($v->{$_}) } sort keys %$v ).')' : pp($$v) ).';' } sort keys %$vars ), } } return undef; }); sub x { my $i = shift; my $y = { val=>123 }; return sub { $y->{val} += $i; my $z = 2; $i *= $z; return $y; } } my $foo = { bar => x(111), }; dd $foo; dd $foo->{bar}->(); dd $foo; __END__ { bar => # my $i = 111; # my $y = { val => 123 }; sub { $y->{'val'} += $i; my $z = 2; $i *= $z; return $y; }, } { val => 234 } { bar => # my $i = 222; # my $y = { val => 234 }; sub { $y->{'val'} += $i; my $z = 2; $i *= $z; return $y; }, }
Is this the origin of the convention in Perl of writing file handle names in ALL UPPERCASE?

I don't know, but that sounds plausible. All I've found in the Camel 2nd Ed. so far is:

Since reserved words are always entirely lowercase, we recommend that you pick label and filehandle names that do not appear all in lowercase. ... Using uppercase filehandles also improves readability and protects you from conflict with future reserved words. ... if you have a package called m, s, y, or tr, then you can't use the qualified form of an identifier as a filehandle because it will be interpreted instead as a pattern match, a substitution, or a translation. Using uppercase package names avoids this problem.
This can be a legitimate concern in that you will get only a run-time error instead of a compiler error

No, you don't even get a run-time error, just a warning, which can easily be swallowed or go unnoticed if running the script from a web server, a daemon, a cron that can't send mails, a script that generates a lot of other output, and so on.

I have yet to write code that had enough bareword file handles for this to be a problem for me. ... This is probably why I have not had typo problems with them, now that I notice it.

Once again, this may apply to you, but I still don't see any good reason to suggest them to newcomers. Lexical filehandles have become the new best practice for lots of good reasons, and I think I've named pretty much all of them by now. One more might be that lexical filehandles are closed automatically when they go out of scope, but bareword filehandles are not (only when the script ends, which may be a lot later, depending on the code).

bareword file handles remain a useful tool in producing concise I/O code in a main script ...

But is that really the case? You mentioned using a filehandle named MANIFEST, and you said "Global file handles should have meaningful names." That makes sense since they're package-global, but for lexical filehandles that have limited scope, using a name like $fh is perfectly fine IMHO. open my $fh, '<', 'Manifest' or die $!; my $data = do { local $/; <$fh> }; close $fh; is 12 characters shorter than open MANIFEST, '<', 'Manifest' or die $!; my $data = do { local $/; <MAINFEST> }; close MANIFEST;, and the latter suffers from all the issues I've already named.

... while also being visually distinctive in their own way, with both ALLCAPS and different syntax highlighting from variables.

If that's really the only remaining advantage, which is strongly dependent on the editor and highlighting scheme, then I don't really see how that stacks up against all the disadvantages I named (and that you haven't responded to).

but I am suspicious of "always" and "never" in general

I know what you mean, and I feel the same way, but I don't think I said "never" anywhere in this thread. I'm a big fan of TIMTOWTDI, but also Tim Toady Bicarbonate. Don't confuse strict coding policies with best practices - the name of the latter already implies that there are other ways to accomplish the same thing, but best practices exist usually because they have advantages over the other practices, it solves issues that other methods have, and so on.

In this case, bareword filehandles are no longer the generally recommended practice in part because of the horrible debugging messes that people have had to deal with over the years of people using globals. Lexicals are simply better, which is far from saying "never use globals". But design desicions made to limit globals can also have positive effects on application architecture.

Although traditionally, Perl has often been used for scripts (I've written plenty myself, and yes, I've used "globals" in a lot of them), something that really opened my eyes is when I spent a few years coding more Java than Perl. In large applications, and especially multi-threaded ones, you just can't use globals, and for example, context objects (when properly implemented) make a ton more sense - they're thread-safe, serializable, etc. and replace the functionality that people often use globals for.

Just for example, someone coding in Mojolicious, where the production-mode server is multiprocess, may try to use globals to share information, only to find that it simply doesn't work, but someone already used to limiting the scope of their variables as much as possible will likely find it much easier to deal with - at least that's been my experience.

The concern I have is the possibility of advice that works well for a beginner, but could unintentionally limit their future growth. I am unsure exactly how "always use lexical file handles" would do that

That might be an important thing to think about, then. Given than they have so many disadvantages, other than making people aware that bareword filehandles exist, what else is there? I think it'd be enough to say something like "by the way, bareword filehandles exist and you may see them in legacy code, but nowadays the best practice is to use lexical filehandles because they have many advantages".

Replies are listed 'Best First'.
Re^12: Summing numbers in a file
by jcb (Parson) on Jun 05, 2020 at 02:34 UTC
    One more might be that lexical filehandles are closed automatically when they go out of scope, but bareword filehandles are not (only when the script ends, which may be a lot later, depending on the code).

    A lexical file handle declared at top-level only goes out of scope at the end of the script, just like a bareword file handle. We have gone way off into the weeds and have now talked past each other. I will just agree to disagree and carry on.

      I will just agree to disagree and carry on.

      If you don't feel like continuing the discussion, that's fine with me. I actually did think a fair bit about whether to respond, but in the end I did want to complete my list of arguments for lexical filehandles so that this thread might be a useful collection of such arguments for others as well.

      I am also left wondering a bit what exactly you disagree with. That you will continue to use them? That's fine with me, as I said early on. That bareword filehandles shouldn't be recommended to newcomers? I still believe that and have provided plenty of arguments. That there is no functional difference? I feel like I've shown this isn't true. Anyway, on to the other points:

      A lexical file handle declared at top-level only goes out of scope at the end of the script, just like a bareword file handle.

      Right, that's true, and yes, part of my argument was for a broader case. Much of what I've said applies no matter the scope of the filehandles.

      We have gone way off into the weeds and have now talked past each other.

      I feel like I've been responding to nearly every one of the points you made. But to re-focus, your initial post was:

      ... I will quibble with ["the preferred style is to use a scalar variable for a file handle"] at top-level as in this case: there is no functional difference between the lexical file handles in your example and the traditional global file handles — in both cases, a handle opened at top-level is defined until the end of the script and valid until closed. Please correct me if I am somehow misinformed about this.

      I've already given plenty of examples of how "there is no functional difference between the lexical file handles in your example and the traditional global file handles" isn't true - unless by "functional" you really just mean "it works", to which I'd say that code using bareword filehandles can stop working much more easily than code using lexical filehandles, and I've named examples of how that can happen (name collisions etc.).

      As for the specific "in both cases, a handle opened at top-level is defined until the end of the script and valid until closed", a nitpicky response to that is that a lexical filehandle can simply be assigned to, as in $fh = undef;, to undefine it, close it, and cause fatal errors in case one attempts to use it again, but I haven't yet found an equivalent for bareword handles.

      The closest I've found so far is use Symbol qw/geniosym/; *FH = geniosym; (thanks to Discipulus for pointing this out), which closes the underlying filehandle, but aside from being cumbersome to say, afterwards print FOO ... will only cause a warning, not an error, which I think is a serious drawback. Other methods, like undef *FH, *FH = *DUMMY, or *FH = do { local *HANDLE; \*HANDLE  } have the major disadvantage that they affect every package variable with the name FH, as in $FH, @FH, %FH, etc. Maybe another Monk knows if there's a real equivalent (it's probably possible with XS), but for now, I have my doubts.

      Another thing to nitpick is that often, code examples posted here are SSCCE's, i.e. the asker has taken code out of a sub and used it at the top level, or someone answering the question is showing a much simplified example where the code is in the file scope, but the code is actually intended to go into a sub somewhere. Since you've said yourself you think bareword filehandles are incorrect in this case, this means that everyone who sees bareword filehandles would have to know and remember to rewrite the bareword filehandles into lexical ones when refactoring code from the top level into a sub. Just using lexicals everywhere seems a much easier solution.

      Sorry, but I don't really see what arguments for bareword filehandles are left, other than making newcomers to the language aware of the fact that they exist but are not a best practice.

      And again, all this is not to say that I hate them or they should "never, ever" be used. Bareword filehandles are definitely something that make Perl interesting; for example in the one-argument open:

      our $FH = "input.txt"; open FH or die $!; # opens input.txt !

      But there's a huge difference between writing "interesting" code, which I do enjoy, and writing modern, robust, well-maintainable applications.

      Update:

      I don't think it's been referenced in this thread yet, but of course TheDamian's Perl Best Practices has several pages of arguments at the beginning of Chapter 10 (I/O) against bareword filehandles and for lexical ones ("indirect filehandles"), and there's the corresponding Perl::Critic::Policy::InputOutput::ProhibitBarewordFileHandles (defaulting to the highest severity), and chromatic's Modern Perl also names them in the chapter "What to Avoid".