isync has asked for the wisdom of the Perl Monks concerning the following question:

is there any difference between:

local( $/, *FILE ) ; open(FILE ,"<$file") or die "could not read file: $!"; binmode FILE; my $raw = <FILE>; close(FILE);
and
open(FILE, $file ) or die "could not read file: $!"; binmode FILE; my $raw = do { local $/; <FILE> }; close(FILE);

Replies are listed 'Best First'.
Re: slurping styles
by ikegami (Patriarch) on Aug 01, 2008 at 10:55 UTC
    Yes.
    • The do uses twice as much memory.
    • The second uses a global variable (*FILE{IO}) without localizing it.

    Cleaned up (with do ⇒ elegant):

    my $raw = do { open(my $fh, '<', $file) or die("Could not read file: $!\n"); binmode $fh; local $/; <$fh> };

    Cleaned up (without do ⇒ saves memory):

    my $raw; { open(my $fh, '<', $file) or die("Could not read file: $!\n"); binmode $fh; local $/; $raw = <$fh>; }
      Do you care to explain why does the do version uses 2x as much memory?
      []s, HTH, Massa (κς,πμ,πλ)

        No can do. I don't know why.

        Update: Actually, it's quite basic. The assignment of do's return value causes it to be copied. You'll notice the same effect from subroutines.

        use Devel::Peek qw( Dump ); { my $x = do { my $y = 'abcdef'; Dump($y); $y }; Dump($x); } { my $x = sub { my $y = 'abcdef'; Dump($y); $y }->(); Dump($x); } __END__ SV = PV(0x226df4) at 0x226ce0 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x1822b04 "abcdef"\0 CUR = 6 LEN = 8 SV = PV(0x226e24) at 0x226d34 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x187a93c "abcdef"\0 <-- new buffer CUR = 6 LEN = 8 SV = PV(0x226e0c) at 0x1830320 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x182c84c "abcdef"\0 CUR = 6 LEN = 8 SV = PV(0x226e60) at 0x226d70 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x1855fac "abcdef"\0 <-- new buffer CUR = 6 LEN = 8
        I'd think that do block creates an anonymous lexical variable to hold the last value, and its content is being copies to the caller.

        Try to measure the memory for returning something like [ <$handle> ], if my theory holds it won't use twice as much memory then because only the array reference is copied.

Re: slurping styles
by pjotrik (Friar) on Aug 01, 2008 at 10:35 UTC
    It is a good practice to localize the variables to the smallest block possible. Therefore, I prefer the second style. In the first case, if you expand your code and open another file within the same block, you may overlook the setting which is still effective.

    Update: I second the others who also recommend to use a lexical for the file handle.

Re: slurping styles
by jettero (Monsignor) on Aug 01, 2008 at 10:47 UTC
    In the second block, you never localize *FILE, which is a dubious practice anyway now that we can use this form:
    open my $file, "<", $file or die "crap: $!"; my $raw = do { local $/; <$file> }; # sexy close $file;

    Personally, my favorite slurping method these days is slurp() from File::Slurp. It's up to version 9_999.x, so you know it's good.

    -Paul

Re: slurping styles
by moritz (Cardinal) on Aug 01, 2008 at 10:55 UTC
    The difference is that in the first case $/ stays undef after your code, thus affecting further operations (like chomp and reads).

    Thus the second form is recommend (better even with a lexical variable as the file handle).

Re: slurping styles
by jwkrahn (Abbot) on Aug 01, 2008 at 11:45 UTC

    Another way to do it:

    open my $FILE, '<:raw', $file or die "could not open $file: $!"; -s $FILE == read $FILE, my $raw, -s $FILE or die "could not read $file +: $!";; close $FILE;
Re: slurping styles
by ysth (Canon) on Aug 02, 2008 at 02:37 UTC
    When you say <FILE>, the readline result is stored in a special variable called a "targ" (target) associated with that particular readline call (it's more complicated if there's recursion or threading) and the assignment copies the value. The special variable holds on to its allocated space for reuse the next time the readline is executed (if ever). This is why you see twice the space allocated in the do{} case.

    There are two optimizations that can prevent it; one is that a number of different operations that normally use a targ switch to using an arbitrary scalar when their result is being assigned to it by scalar assignment. Compare the direct assignment vs. the assignment with an intermediary operation:

    $ perl -MO=Concise,-exec -we'my $foo = scalar <STDIN>' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v 3 <#> gv[*STDIN] s 4 <1> readline[t3] sK/1 5 <0> padsv[$foo:1,2] sRM*/LVINTRO 6 <2> sassign vKS/2 7 <@> leave[1 ref] vKP/REFC -e syntax OK ~$ perl -MO=Concise,-exec -we'my $foo = <STDIN>' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v 3 <0> padsv[$foo:1,2] sRM*/LVINTRO 4 <#> gv[*STDIN] s 5 <1> readline[t3] sKS/1 6 <@> leave[1 ref] vKP/REFC -e syntax OK
    (The scalar() operation itself is optimized away, but nevertheless interferes with the other optimization.) Note that the sassign operation disappears and readline takes the sv to read into as an extra argument (to which it is alerted by the extra S (STACKED) flag).

    The other optimization that prevents readline from using its targ is specific to readline. When you catenate onto a buffer, the readline and concatenation operations are joined into a single rcatline operation:

    $ perl -MO=Concise,-exec -we' $foo.=<STDIN>' Name "main::foo" used only once: possible typo at -e line 1. 1 <0> enter 2 <;> nextstate(main 1 -e:1) v 3 <#> gvsv[*foo] s 4 <#> rcatline[*STDIN] sS 5 <@> leave[1 ref] vKP/REFC -e syntax OK
Re: slurping styles
by Anonymous Monk on Aug 01, 2008 at 11:46 UTC
    my $raw = slurp_file( $file); sub slurp_file { return do { local ( *ARGV, $/, $^I ); use open qw' IN :bytes '; @ARGV = @_; <>; }; }