Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

"open" Best Practices

by haukex (Archbishop)
on Jul 11, 2019 at 14:41 UTC ( [id://11102684]=perlmeditation: print w/replies, xml ) Need Help??

open Best Practices

TL;DR: open my $fh, '<', $filename or die "$filename: $!";

You will see styles of open such as "open FILE, $filename;" or "open(LOG, ">$filename") || die "Could not open $filename";" in many places. These mainly come from versions of Perl before 5.6.0 (released in 2000), because that version of Perl introduced lexical filehandles and the three-argument open. Since then, these new features have become a best practice, for the reasons below.

1. Use Lexical Filehandles

Instead of open FILE, ..., say: open my $fh, ....

Lexical filehandles have the advantage of not being global variables, and such filehandles will be automatically closed when the variable goes out of scope. You can use them just like any other filehandle, e.g. instead of print FILE "Output", you just say print $fh "Output". They're also more convenient to pass as parameters to subs. Also, "bareword" filehandles like FILE have a potential for conflicts with package names (see choroba's reply for details), and they don't protect against typos like lexical filehandles do! (For two recent discussions on lexical vs. bareword filehandles, see this and this thread.)

2. Use the Three-Argument Form

Instead of open my $fh, ">$filename", say: open my $fh, '>', $filename.

In the two-argument form of open, the filename has to be parsed for the presence of mode characters such as >, <+, or |. If you say open my $fh, $filename, and $filename contains such characters, the open may not do what you want, or worse, if $filename is user input, this may be a security risk! The two-argument form can still be useful in rare cases, but I strongly recommend to play it safe and use the three-argument form instead.

In the three-argument form, $filename will always be taken as a filename. Plus, the mode can include "layers", so instead of having to do a binmode after the open, you can just say e.g. open my $fh, "<:raw", $filename, or you can specify an encoding such as open my $fh, ">:encoding(UTF-8)", $filename. Note: As documented, :encoding(UTF-8) should be preferred over :utf8, and on Windows, to decode UTF-16 properly, you need to say ":raw:encoding(UTF-16):crlf", because otherwise the default :crlf layer will incorrectly mangle the Unicode characters U+0D0A or U+0A0D. Be aware that if you don't specify any layers, the layers in ${^OPEN} are used (see that link for details).

3. Check and Handle Errors

open my $fh, '<', $filename;                          # Bad: No error handling!
open my $fh, '<', $filename  || die ...;              # WRONG!1
open my $fh, '<', $filename  or die "open failed";    # error is missing info

open my $fh, '<', $filename  or die "$filename: $!";  # good
open(my $fh, '<', $filename) or die "$filename: $!";  # good
open(my $fh, '<', $filename) || die "$filename: $!";  # works, but risky!1

use autodie qw/open/;  # at the top of your script / code block
open my $fh, '<', $filename;                          # ok, but read autodie!

You should check the return value of the open function, and if it returns a false value, report the error that is available in the $! variable. It is best to also report the filename as well, and of course you're free to customize the message as needed (see the tips below for some suggestions).

1 It is a common mistake to use open my $fh, '<', $filename || die ... - because of the higher precedence of ||, it actually means open( my $fh, '<', ($filename || die ...) ). So to avoid mistakes, I would suggest just staying away from || in this case (as is also highlighted in these replies by AM and eyepopslikeamosquito).

Note that open failing does not necessarily have to be a fatal error, see some examples of alternatives here. Also, note that the effect of autodie is limited to its lexical scope, so it's possible to turn it on for only smaller blocks of code (as discussed in kcott's reply).

4. Additional Tips

  • Make sure that the filename you're opening always matches the filename in the error message. One easy way to accomplish this is to use a single variable to hold the filename, like $filename in the above examples (as described in Eily's reply).
  • Consider putting the filename in the error message in quotes or similar, such as "'$filename': $!", so that it's easier to see issues arising from whitespace at the beginning or end of the filename (as suggested in Discipulus's reply).
  • In addition, consider adding even more useful details to your error message, such as whether you're trying to read or write from/to the file, and put quotes around $! as well, so it's easier to tell everything apart (as suggested by haj).
  • On Windows, consider also displaying $^E as part of the error message for more information (as suggested in Discipulus's reply).
  • If you're setting global variables that will affect reading the file, like $/, it's best to use local in a new block (as mentioned in stevieb's reply).
  • Remember that it's possible for multiple processes to access the same file at the same time, and you may need to consider a way to coordinate that, such as file locking (as mentioned in davido's reply).
  • For even more discussion, see Chapter 10, "I/O", in the book Perl Best Practices by TheDamian, also the book Modern Perl by chromatic is a great book about more modern Perl.

Fellow Monks: I wrote this so I would have something to link to instead of repeating these points again and again. If there's something you think is worth adding, please feel free to suggest it!

Update 2019-07-12: Added section "Additional Tips", mentioned bareword filehandles, and added a bit more on autodie. Thanks to everyone for your suggestions! 2019-07-13: Added more suggestions from replies, thanks! 2020-04-19: Added mention of typo prevention, as inspired by lexical vs. local file handles. 2020-06-07: Added links to threads about bareword vs. lexical handles and added note about :crlf and UTF-16 interaction on Windows. 2022-02-08: Updated notes on layers.

Replies are listed 'Best First'.
Re: "open" Best Practices
by Eily (Monsignor) on Jul 11, 2019 at 15:53 UTC

    ++ For making this :)

    Unless you are using autodie, I'd say: always use a temporary variable for the filename, to use both in the call to open and in the error message. This is to avoid having an error message that says that file A is missing, or can't be read, when you were actually trying to read file B, and to avoid looking in the wrong folder. Two examples (which I've both been guilty of):

    open my $input, '<', "my_input.xml" or die "Can't open my_input.csv: $ +!"; # Whoops, my_input.xml does not exist by my_input.csv file does my $input_file = "my_input.xml"; open my $input, '<', $input_file or die "Can't open $input_file: $!"; +# Still doesn't work but the error message is correct
    my $file = get_file(); my $folder = get_folder(); # The function returned an empty string by +mistake open my $data, '<', "$folder/$file" or die "Can't open $file; $!"; # B +ut I see the file my folder, why does it say it doesn't exist? my $input_file = "$folder/$input"; open my $data, '<', $input_file or die "Can't open $input_file: $!"; # + You'll see straightaway that you the folder is empty

    Also, that's not really specific to open but try to give meaningful names to your variables. Having to use numbers is nearly always a sign that the name is wrong. $input and $output is always better than $file and $file2. $reference_data and $new_data is also better than $fh1 and $fh2.

Re: "open" Best Practices
by choroba (Cardinal) on Jul 11, 2019 at 16:50 UTC
Re: "open" Best Practices
by davido (Cardinal) on Jul 11, 2019 at 22:42 UTC

    Perhaps mention something about paying attention to the ownership and permissions on newly created paths and files, and to consider the ramifications of locking (or failing to do so). So often we envision our solutions in terms of simple development environments not considering that there's a real world where processes can compete with each other over the same resource, or where not everyone is the same person as $ENV{'USER'}, and so on. At minimum, if our code is expected to be the only instance of itself running (so that it's not competing for a resource with other instances of itself) are we doing anything to assure it is the only instance? Have we considered that someone might be writing to, unlinking, moving, copying, reading from the same file we're working with?

    Using open correctly is the easy part.


    Dave

      Using open correctly is the easy part.

      Yes, and hence my relatively short node - I really just wanted to focus on the open statement itself, so I'd have something to link to in posts like this one. But I've added a mention of your suggestion to the root node, thanks!

      "...the easy part."

      Well said. Some callow thoughts about some situations when things get a little bit harder: I just wondered if some ugly locking stuff could be handled with mce_open from MCE::Shared::Handle. Probably this may be yet another case of total abuse of a module. And testing with lsof if a file is in use might be an option from time to time as well. Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        Hi karlgoethebier,

        The mce_open call behaves similarly to the native open in Perl. It creates a shared handle with exclusive locking handled transparently.

        Example:

        use strict; use warnings; use feature 'say'; use MCE::Hobo; use MCE::Shared; # Passing a file ref without IO::FDPass will work for \*STDIN, # \*STDOUT, \*STDERR, and \*main::DATA only. Otherwise, providing # the actual path is preferred (i.e. "/path/to/file"). mce_open my $out, '>>', \*STDOUT or die 'open failed: $!'; mce_open my $err, '>>', \*STDERR or die 'open failed: $!'; printf $out "shared fileno(\$out) is %d\n", fileno($out); printf $out "shared fileno(\$err) is %d\n", fileno($err); if ($^O ne 'MSWin32') { mce_open my $log, '>>', '/dev/null' or die 'open failed: $!'; say $log "Hello, there!"; # sent to null } # The shared-handles work with threads, MCE::Hobo, MCE::Child, # Parallel::ForkManager, and other parallel modules on CPAN. # Note: There is no reason to choose MCE::Child over MCE::Hobo # if already involving the shared-manager. sub foo { my ($id) = @_; say $out "Hello, from pid $$"; } MCE::Hobo->create('foo', $_) for 1..4; MCE::Hobo->wait_all;

        Output:

        shared fileno($out) is 1 shared fileno($err) is 2 Hello, from pid 4651 Hello, from pid 4652 Hello, from pid 4653 Hello, from pid 4654

        See also:

        https://metacpan.org/pod/MCE::Shared#DBM-SHARING
        https://metacpan.org/pod/MCE::Shared#LOGGER-DEMONSTRATION

        Regards, Mario

Re: "open" Best Practices
by Discipulus (Canon) on Jul 12, 2019 at 07:33 UTC
    Thanks haukex for this, I vote for you and hippo as new perlmonks tutorial organiser ;) I confused haukex and hippo.. both are anyway worth to thanks ;)

    > It is best to also report the filename as well, and of course you're free to customize the message as needed.

    About this I almost always use the filename within square brackets or single quotes, to avoid nasty empty space problems (even if your use, with filename as first element and followed by : can spot these problems too). I also spend more words in errors, just in case:

    open my $fh, '<', $filename or die "Unable to open [$filename] becaus +e of: $!";

    Also, being on unfortunate OS , sometimes I add $^E aka $EXTENDED_OS_ERROR ( under MSWIN32 $^E can be different from $! ):

    open my $fh, '<', $filename or die "Unable to open [$filename] becaus +e of: $! $^E";

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: "open" Best Practices
by kcott (Archbishop) on Jul 12, 2019 at 08:31 UTC

    G'day haukex,

    ++ Good write up and definitely handy for linking: I've certainly written similar information in dozens of replies over the years.

    "use autodie qw/open/;  # at the top of your code"

    Possibly out of scope for this; but, it may be worth mentioning the lexical nature of autodie.

    I've worked on a lot of production code that uses in-house I/O error handling; e.g. the custom handling includes logging, email alerts, popup messages, and so on. At times, I don't want this for a small part of code I've added; however, I also don't want to interfere with the existing, custom error handling.

    A very rough example might be something like:

    ... use InHouse::IO::ErrorHandling; ... open ...; # in-house handling here ... { use autodie; ... open ...; # autodie handling here ... } ... open ...; # in-house handling here ...

    — Ken

Re: "open" Best Practices
by stevieb (Canon) on Jul 11, 2019 at 14:53 UTC

    Nice! I'd add a blurb about using close(), as well as a scoped block example (perhaps linking the two). One good example of when to use a block is using a local record separator variable (the close isn't really necessary in this case, but I digress):

    my $file = 'test.txt'; my $json; { local $/; open my $fh, '<', $file or die $!; $json = <$fh>; close $fh; }

      Thanks, and yes, good points! The node is aimed at people using the open statement in a "non-best-practice" way, so I'd like to keep the node short enough so that they might still read it (hence all the bold text and TL;DR at the top), and I'd like to keep the focus on the open statement itself - I left out a discussion e.g. of piped opens and checking close for errors and such intentionally, because when answering questions about that, I have other nodes I'd link to instead. Perhaps your reply to my node pointing these things out is enough? :-)

      Update: Ok, I've updated the root node!

Re: "open" Best Practices
by swl (Parson) on Jul 13, 2019 at 03:45 UTC

    Thanks for the write-up, it's very useful. The first para can be read to imply that the not-so-good options are currently best practices.

    You will see styles of open such as "open FILE, $filename;" or "open(LOG, ">$filename") || die "Could not open $filename";" in many places. These mainly come from versions of Perl before 5.6.0 (released in 2000), because that version of Perl introduced lexical filehandles and the three-argument open. Since then, they have become a best practice, for the reasons below.

    Perhaps reword "they have become best practice" to something like "the three argument open has become a best practice".

      I thought about how to make it more clear without repeating myself too much, so I hope I've improved the wording now. Thank you for pointing this out!

Re: "open" Best Practices
by VinsWorldcom (Prior) on Jul 11, 2019 at 16:05 UTC

    Thank you! Can't say I've always done that in the past (although I've been using the 3-arg form for..?ever?). The "or die" with $!:

    open my $fh, '<', $file or die "$0: cannot open $file: $!";

    has been updated in my quick snippets!

    UPDATE: removed extra semicolon after first line - thanks haukex

Re: "open" Best Practices
by haj (Vicar) on Jul 12, 2019 at 17:45 UTC

    Nice writeup, indeed good for linking to.

    Some minor nitpickings:

    • I like to enclose both the file name and the error messages in quotes, to separate them clearly from your own message text. Especially error messages usually have several words, which sometimes makes parsing of the sentence difficult.
    • I would also include the operation (read, write) which was desired.

    By the way: autodie does all of that :)

Re: "open" Best Practices
by Anonymous Monk on Jul 13, 2019 at 02:18 UTC

      Well, I disagree with the "never", but I did strengthen my warning against it a bit, thanks! BTW, some of those links are of course very relevant, but some of them (e.g. the first two) don't mention the || vs or at all.

        I always use the low precedence and and or operators for flow of control, for example preferring:

        open(my $fh, '<', $file) or die "error opening '$file': $!";
        to the equivalent:
        open(my $fh, '<', $file) || die "error opening '$file': $!";

        while always using && and || inside logical expressions, for example preferring:

        if ($x > 5 || $y < 10) ...
        to the equivalent:
        if ($x > 5 or $y < 10) ...

        When this style is followed consistently, I find the code easier to read and understand at a glance.

        See also Perl Best Practices, Chapter 4, "Values and Expressions", "Don't mix high- and low-precedence booleans" item.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11102684]
Approved by Corion
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (9)
As of 2024-04-23 14:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found