http://qs1969.pair.com?node_id=1217182

haukex has asked for the wisdom of the Perl Monks concerning the following question:

I sometimes use (and suggest) this to slurp a file in a single line:

my $data = do { open my $fh, '<', $file or die $!; local $/; <$fh> };

The lexical filehandle should be automatically closed at the end of the block. Can anyone think of any downsides to the above? Like perhaps some subtle scoping issue, or it's bad style to rely on the implicit close, etc. ... or am I just being paranoid?

Update: Typo fix, $/ not $\, thanks Corion!

Replies are listed 'Best First'.
Re: Any downsides to this slurp idiom?
by BrowserUk (Patriarch) on Jun 22, 2018 at 14:29 UTC

    If the file you're slurping has any size, you're better to do:

    my $s; do{ local( @ARGV, $/)='big.file'; $s = <> };
    than
    my $s = do{ local( @ARGV, $/)='big.file'; <> };

    As the latter makes two copies of the file. One in a temporary, and then a copy in $s.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

      That's for 5.18 and older versions of Perl. Perhaps related somehow to COW becoming default in 5.20, same mechanism of optimization.

        I'm sorry to ask but I'm interested and not sure I follow. Are you saying that BrowserUk's two examples are equivalent in the background with 5.20 and up? Copy on write being used to make the "temporary" data the same as the scalar's?

        That sounded plausible, but I just tried it on 5.22 with these results, using this code in my repl:

        { my $s; do{ local( @ARGV, $/)='1gb.db'; $s = <> }; <STDIN>; my $t = do{ local( @ARGV, $/)='1gb.db'; <> }; <STDIN> };;

        Note how the first bump in the memory to 1.2GB remains at that level after the first burst of IO finishes,

        It then climbs to 2.4GB for the second burst of IO, and then climbs again to 3.5GB immediately the IO stops.

        Ie. When the temporary buffer within the do block is copied into the target scaler $t.

        (Perhaps the IsCOW only operates on *nix? (I'm on win.)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Re: Any downsides to this slurp idiom?
by tybalt89 (Monsignor) on Jun 22, 2018 at 12:51 UTC

    You're just being paranoid :)

    I often use

    my $data = do { local(@ARGV, $/) = $file; <> };

    Similar but shorter :)

Re: Any downsides to this slurp idiom?
by talexb (Chancellor) on Jun 22, 2018 at 13:36 UTC

    That reads pretty cleanly to me .. you're slurping an entire file into a scalar. But then I've been around the language long enough that I recognize what you're doing. When I'm unsure if someone else might not understand a line of somewhat tricky Perl, I just put a comment above the line, then move on.

    PS: You could also use autodie; to make the code even shorter.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Any downsides to this slurp idiom?
by Eily (Monsignor) on Jun 22, 2018 at 12:54 UTC

    One downside I see is that it's not easy to read. You're more likely to read

    open $fh, '<', $file or die $!; process($fh) if is_valid($fh);
    than
    die $! unless open $fh, '<', $file; is_valid($fh) and process($fh);
    because the first version puts the important information first. In that idiom, the important part (the assignment) is torn away to both ends.

    Also if you have something like this:

    my $data; { local $/; open my $fh, '<', $file or die $!; $data = <$fh>; }
    You can afford not to know what $/, $! and '<' mean, and still have a pretty good idea that a file is being opened, and the content written to $data (even if you don't understand exactly why it's written like that). In your idiom, you have one line with many harder concepts, and you can't really extract the easy part.

    I'm tempted to say the best solution would be to use a function that abstracts all that away, but I honestly I never do it myself :P.

    it's bad style to rely on the implicit close
    Personally I even consider it good style, because if you actively rely on the implicit close, you are going to focus on having the correct scope for your handle. The alternative is either you close before the end of the scope, or rewriting the variable, in which case the close doesn't do anything more, or you close the handle and keep it, which is only fine if you actually meant, and needed it to do it.

Re: Any downsides to this slurp idiom?
by kcott (Archbishop) on Jun 23, 2018 at 08:47 UTC
Re: Any downsides to this slurp idiom?
by haj (Vicar) on Jun 22, 2018 at 14:18 UTC

    If you don't mind spending a few more bytes, then I'd always include the file name in the message:

    my $data = do { open my $fh, '<', $file or die "'$file': $!"; local $/; <$fh> };
Re: Any downsides to this slurp idiom? (updated)
by haukex (Archbishop) on Jun 30, 2018 at 19:34 UTC

    Thanks everyone for your replies! :-) For completeness, here are some slurping examples, incorporating various suggestions:

    • The basic version (with the improved error message first suggested by haj):
      my $data = do { open my $fh, '<', $file or die "$file: $!"; local $/; <$fh> };
    • Opening a file with an encoding (in this case UTF-8):
      my $data = do { open my $fh, '<:raw:encoding(UTF-8)', $file or die "$file: $!"; local $/; <$fh> };
    • A version that should use less memory, suggested by BrowserUk (see this discussion - Copy-On-Write, available in newer Perls, may take care of this):
      my $data; { open my $fh, '<', $file or die "$file: $!"; local $/; $data = <$fh> };
    • This short version, first suggested by tybalt89, however, note that as opposed to the above examples, this does not die but only emits a warning if the file could not be opened (unless FATAL warnings are in effect, the minimum needed is use warnings FATAL=>'inplace';; Update: fixed as per choroba's reply, thanks!):
      my $data = do { local (*ARGV,$/); @ARGV=$file; <> };

    Minor edits for clarity.

    Update 2: Actually, I made a mistake in the last example when I first fixed it, it is now tested and correct.

      Note that localizing @ARGV could be not enough (because eof(ARGV) might remain true). So,
      my $data = do { local ( *ARGV, $/ ); @ARGV = ("$file"); <> };

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Any downsides to this slurp idiom?
by jimpudar (Pilgrim) on Jun 22, 2018 at 16:15 UTC

    There are a lot of good answers to your question here already, so I thought I would ask another tangentially related one.

    Why not use File::Slurp?

    Best,

    Jim

    πάντων χρημάτων μέτρον έστιν άνθρωπος.

        Huh, very interesting. Sounds like I will probably be switching over to File::Slurper from now on.

        Thanks!

        Jim

        πάντων χρημάτων μέτρον έστιν άνθρωπος.

      Why not use File::Slurp?

      One reason is what hippo said, the other being that this is IMO one of those cases where pure Perl does everything I need - I can read lines in list context and I can use layers like :raw, :crlf, and :encoding(...) - there's really no need to load a module. (Unless I'm doing additional stuff, like filename manipulation, in which case I usually use Path::Class, which has slurping and spewing built in.)

Re: Any downsides to this slurp idiom?
by jbodoni (Monk) on Jun 24, 2018 at 14:35 UTC
    You're not being any more paranoid than me. :) I always fear that the file I'm reading (usually generated by another program written by someone else) will one day be larger than available memory. As a result I almost always process input files line by line. If I need to make a temporary buffer because I'm processing a chunk of data (defined by line count, byte count, start/end markers, whatever), I'm okay with that.
Re: Any downsides to this slurp idiom?
by Anonymous Monk on Jun 22, 2018 at 14:19 UTC

    Well, I think that $. does not get reset unless you do an explicit close, but if you are slurping you are not interested in line numbers anyway.

A reply falls below the community's threshold of quality. You may see it by logging in.