chester has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I was using File::Slurp and I noticed newlines were constantly being added to the end of every line after multiple reads/writes of the same file. The module appears (to me) to be buggy in Windows.

use strict; use warnings; use File::Slurp; use Data::Dump qw{dump}; undef $/; my $file_slurp = read_file('file'); open my $FILE, '<', 'file' or die $!; my $normal = <$FILE>; close $FILE; $normal eq $file_slurp ? print "Yep.\n" : print "Nope.\n"; dump($file_slurp); dump($normal);

file:

Line Line2 Line3

output:

Nope. "Line\r\nLine2\r\nLine3\r\n\r\n" "Line\nLine2\nLine3\n\n"

I undef $/ here because it's the only thing I could think to try; File::Slurp mentions using it in passing, but it doesn't help. Putting the "normal" version in its own block and using local on $/ has the same result. I haven't tested this in Linux, but the difference between line terminators in Windows and Linux (and everything else) strikes me as a possible source of error here.

Two questions:

1) Can anyone confirm this behavior? Is it normal? I know Perl6 will have slurp; is there some nuance to slurp which causes this to be expected behavior?

2) Is it worth bothering to use a module to slurp files (in Perl5)? It's as simple as local-izing $/ and reading into a scalar. If I'm really feeling lazy I can throw it into my own slurp sub. Will Perl6's slurp have any more to offer?

Replies are listed 'Best First'.
Re: File::Slurp bug? Should I bother?
by itub (Priest) on Sep 07, 2005 at 15:37 UTC
    Is it worth bothering to use a module to slurp files (in Perl5)?

    That's for you to decide, but IMO, in general, no. I feel that adding another dependency for the convenience of saving two lines of code is not a good idea, especially if you intend to distribute the code. However, I'd feel free to use it for internal project if I found it convenient enough (which I don't) or if I needed better performance and found that this module provided it.

Re: File::Slurp bug? Should I bother?
by ikegami (Patriarch) on Sep 07, 2005 at 15:48 UTC

    First, let's forget about $/. $/ has no effect on read_file (in the latest version of File::Slurp) unless you save the result of the function directly into an array, or if you use the array_ref => 1 option.

    Second, You say it's adding newlines, but what it's doing is not removing carriage returns, as if the file was open in binary mode. What version of File::Slurp are you using? Maybe older version used binary mode by default. In the latest version, binary mode is only used if you use the binary => 1 option.

    I don't have File::Slurp installed, so I haven't run your snippet, but I don't see anything in the source (of the latest version) that explains what you see.

Re: File::Slurp bug? Should I bother?
by jmcnamara (Monsignor) on Sep 07, 2005 at 15:50 UTC

    Does binmodeing the filehandle fix the problem?

    ... my $file_slurp = read_file('file', binmode => ':raw'); ...

    --
    John.

      Isn't that backwards? I believe he wants to remove the "added" Carriage Returns, not preserve them.

        Yes, it is more than a little backwards. Binmodeing the $FILE filehandle gives the same results in both cases.
        ... binmode $FILE; ... __END__ Prints: "Line\r\nLine2\r\nLine3\r\n" "Line\r\nLine2\r\nLine3\r\n" Yep.

        In which case you would also have to binmode the output filehandles in order not to add extra carriage returns!!

        HOWEVER, it doesn't seem right that the carriage returns are in the File::Slurp data without binmode being set. Is this a function of using sysread?

        Can't check at the moment I'll look at it again later.

        --
        John.

Re: File::Slurp bug? Should I bother?
by chester (Hermit) on Sep 07, 2005 at 16:07 UTC
    Thanks for the replies so far. My version is 9999.09, which is current. Passing the binmode parameter doesn't change anything.

    Here is another test:

    use strict; use warnings; use File::Slurp; my $file_slurp = read_file('file'); my $normal; { local $/; open my $FILE, '<', 'file' or die $!; $normal = <$FILE>; close $FILE; } write_file('slurp', $file_slurp); open my $NORMAL, '>', 'normal' or die $!; print $NORMAL $normal; close $NORMAL; open my $SLURP_NORMAL, '>', 'slurp_normal' or die $!; print $SLURP_NORMAL $file_slurp; close $SLURP_NORMAL;

    Doing this, both normal and slurp end up with the file identical to the original, but slurp_normal has the "added" newlines. So something in File::Slurp is "correcting" whatever it did to begin with when I use write_file. I'm puzzled.

      I installed File::Slurp and I can't reproduce your results. What do you get when you run the following? File::Slurp should remove carriage returns by default.

      use strict; use warnings; use File::Slurp qw( read_file ); my $f = 'testdata'; { open(my $fh, '>', $f) or die $!; print $fh ("line1\n"); print $fh ("line2\n"); print $fh ("line3\n"); } { print("Disk size: "); print((stat($f))[7], "\n"); print("\n"); print("Inlined, binmode: "); print(length(do { local $/; open(my $fh, '<', $f) or die $!; binmode($fh); <$fh> }), "\n"); print("Slurp, binmode: "); print(length(read_file($f, binmode => 1)), "\n"); print("\n"); print("Inlined, not binmode: "); print(length(do { local $/; open(my $fh, '<', $f) or die $!; <$fh> }), "\n"); print("Slurp, not binmode: "); print(length(read_file($f, binmode => 0)), "\n"); print("\n"); print("Slurp: "); print(length(read_file($f)), "\n"); } print("\n"); print("\n"); { my $inline = do { local $/; open(my $fh, '<', $f) or die $!; <$fh> }; my $slurp = read_file($f); if ($inline eq $slurp) { print("Slurp is indentical to inlined version.\n"); } else { print("Slurp is different from inlined version.\n"); } }

      You should get the following, exactly:

      Disk size: 21 Inlined, binmode: 21 Slurp, binmode: 21 Inlined, not binmode: 18 Slurp, not binmode: 18 Slurp: 18 Slurp is indentical to inlined version.

      Environment:

      >perl -MFile::Slurp -le "print $File::Slurp::VERSION" 9999.09 >perl -v This is perl, v5.6.1 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) ...
        Disk size: 21 Inlined, binmode: 21 Slurp, binmode: 21 Inlined, not binmode: 18 Slurp, not binmode: 21 Slurp: 21 Slurp is different from inlined version.

        C:\>perl -MFile::Slurp -le "print $File::Slurp::VERSION" 9999.09 C:\>perl -v This is perl, v5.8.6 built for MSWin32-x86-multi-thread (with 3 registered patches, see perl -V for more detail)

        Blarg, OK, this may be a problem with sysread.

        use strict; use warnings; use Fcntl qw{:DEFAULT}; use Data::Dump qw{dump}; sysopen my $FILE, 'file', O_RDONLY or die $!; my ($temp, $buf); while(1) { my $read_cnt = sysread ($FILE, $temp, 1024); $buf .= $temp; last unless $read_cnt; } print dump($buf);
        This gives me the faulty newlines. I think this is a bug in my Perl build. Google turned up some (vague) results to that effect. I assume sysread is supposed to do CRLF translation to \n, just like <>? I checked, and the layers in effect on this filehandle are ("unix", "crlf"). Should be working.