Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Monks !

I'm relatively new to Perl and still in a phase of learning how things work. My question might sound stupid to you but I searched perldoc back and forth and googled through the web (including your great site) without finding an answer. So here's my question:

Short version: How do I portably and correctly open() a binary file ?


Long version or how I got all confused about that:

My PerlIO documentation states the following about the ":raw" layer:

The ":raw" layer is defined as being identical to calling "binmode($fh)" – the stream is made suitable for passing binary data i.e. each byte is passed as-is. The stream will still be buffered.

Having read that I thought the following two code fragments would be equivalent:

open(my $fh, '<', $filename); binmode($fh);
open(my $fh, '<:raw', $filename);

Because the second one is shorter (and I only have to check one function return value) I used that method in some scripts.

So far so good. A couple of days ago I discovered the PerlIO::get_layers() function and curious and naiv as I am I did the following:

$, = ' '; $\ = "\n"; open(FH, '<', '/dev/null'); print(PerlIO::get_layers(FH)); # --> unix perlio

As documented and expected this is the default layer stack on an UNIX system.

open(FH, '<', '/dev/null'); binmode(FH); print(PerlIO::get_layers(FH)); # --> unix perlio

So is this one. No problems yet, but look at the next one:

open(FH, '<:raw', '/dev/null'); print(PerlIO::get_layers(FH)); # --> unix

Big surprise (at least for a dummy like me): The ":perlio" layer mysteriously disappeared. And this mystery is the whole reason for this post.

Obviously, using ":raw" in open() is not the same as calling binmode() afterwards. Having no ":perlio" layer also means that the stream is unbuffered now, right ? (Which explains the results of this PerlIO Benchmark, which I found while searching for answers.)

Btw., My perl version is v5.8.4 but a friend of mine just confirmed the last code snippet on perl version v5.8.8.

Could someone please explain to me what I misunderstood so badly here. (Need not be a lengthy answer, a link or a hint to the right place in some Perl documentation would suffice.)

Thanks so much !

Replies are listed 'Best First'.
Re: Using ":raw" layer in open() vs. calling binmode()
by jdalbec (Deacon) on Jun 14, 2007 at 02:32 UTC
    This is perl, v5.8.6 built for darwin-thread-multi-2level
    On my system the first two snippets return stdio instead of unix perlio.
    open (FH, '<:raw:perlio', '/dev/null'); print(PerlIO::get_layers(FH)); # --> unix perlio
    Does that work for you?
Re: Using ":raw" layer in open() vs. calling binmode()
by jbert (Priest) on Jun 14, 2007 at 09:52 UTC
    I can confirm this:
    $ perl --version This is perl, v5.8.8 built for i486-linux-gnu-thread-multi ... $ perl -MPerlIO -e 'open($fh, "<", "/dev/null"); print join(", ", PerlIO::get_layers($fh)), "\n";' unix, perlio $ perl -MPerlIO -e 'open($fh, "<:raw", "/dev/null"); print join(", ", PerlIO::get_layers($fh)), "\n";' unix $ perl -MPerlIO -e 'open($fh, "<", "/dev/null"); binmode $fh; print join(", ", PerlIO::get_layers($fh)), "\n";' unix, perlio
    Looks like a bug to me, since the behaviour appears to contradict the description in the docs (either a doc bug or a code bug).

    For what it's worth, I tend to use the binmode $fh; variant, but that's a habit formed in pre-layer times.

Re: Using ":raw" layer in open() vs. calling binmode()
by Util (Priest) on Jun 14, 2007 at 12:21 UTC

    Three data points (Linux, OS X, and Win32) that do little to clarify the confusion.

    Summary:

    5.8.5 linux < => unix perlio 5.8.5 linux < binmode => unix perlio 5.8.5 linux <:raw => unix 5.8.5 linux <:raw:perlio => unix perlio 5.8.6 darwin < => stdio 5.8.6 darwin < binmode => stdio 5.8.6 darwin <:raw => unix 5.8.6 darwin <:raw:perlio => unix perlio 5.8.6 win32 < => unix crlf 5.8.6 win32 < binmode => unix crlf 5.8.6 win32 <:raw => unix 5.8.6 win32 <:raw:perlio => unix perlio

    Detail:

Re: Using ":raw" layer in open() vs. calling binmode()
by Anonymous Monk on Jun 14, 2007 at 19:46 UTC

    First of all, thank you very much for all your responses !

    As jbert pointed out in his reply this seems to be either a bug in PerlIO itself or an erroneous (or at least misleading) documentation.

    So my conclusion is:

    To open() a file containing binary data in a portable manner without loosing buffering use the sequence

    open(my $fh, '<', $filename); binmode($fh);

    and avoid

    open(my $fh, '<:raw', $filename)

    how tempting it may look at first.


    Some additional comments to my original post:

      Depending on what you want to do with your 'raw' filehandle, then using:

      open(my $fh, '<', $filename); binmode($fh);
      ...may not be appropriate (at least on windows).

      If you want to read raw bytes from a filehandle (for example to look for a BOM) and _then_ add an encoding layer then you need to ensure that there isn't a :crlf layer hanging around to begin with which the above has (it may not be enabled, but it's still there *). This is because of :crlf not working nicely with encodings eg. it breaks tell().

      So, the method i use is:

      open(my $fh, "<:raw:perlio", $file); # read first few bytes for a BOM or ASCII in utf(16|32) # to determine encoding binmode $fh, ":encoding($encoding)"
      * Consider:
      open $fh, "<", $^X or die; binmode $fh; print "1: ", join(' ', PerlIO::get_layers($fh, details => 1)), "\n"; close $fh; open $fh, "<:raw:perlio", $^X or die; print "2: ", join(' ', PerlIO::get_layers($fh, details => 1)), "\n"; close $fh;
      gives:

      1: unix 4195328 crlf 4195328 2: unix 4195328 perlio 4195328

        Thanks for your advice !

        It's interesting that binmode() only disables the :crlf layer but doesn't remove it. (Just discovered, that's actually documented in PerlIO.)

        Does it still interfere with tell() even when it's disabled ? (perlport says: "If you use binmode on a file, however, you can usually seek and tell with arbitrary values in safety.")

        I'm just curious, what happens if you try these on Windows:

        open $fh, "<", $^X or die; binmode $fh, ":raw"; # perldoc says: identical to 'binmode $fh' print "3: ", join(' ', PerlIO::get_layers($fh, details => 1)), "\n"; close $fh; open $fh, "<", $^X or die; binmode $fh, ":raw:perlio"; # seems not useful (see below) print "4: ", join(' ', PerlIO::get_layers($fh, details => 1)), "\n"; close $fh;

        On UNIX that gives:

        3: unix 4195328 perlio 4195328 4: unix 4195328 perlio 4195328 perlio 4195328

        I know this discussion should probably go into a Perl develompent group but let me just explain where I'm having problems to understand things here:

        If one has to use open($fh, '<:raw:perlio', $file) to get a clean and buffered binary stream this unconditionally forces a :perlio layer onto the stack, which in turn inserts a :unix layer below it.

        Now, what about MacOS where Perl uses :stdio by default ? (I know far too little about MacOS but it must be faster than :unix:perlio there.) Also if someone wants to change the default layers via the PERLIO environment variable (for whatever reason), it would be blithely ignored on those streams.

Re: Using ":raw" layer in open() vs. calling binmode()
by jdalbec (Deacon) on Jun 14, 2007 at 23:30 UTC
    I don't have Perl on Windows, so I can't check whether all binary characters are read correctly, but I can confirm that on Mac OS X 10.4.9 Perl 5.8.6 <:raw:perlio does not lose buffering. I copied the PerlIO benchmark you referenced and added raw:perlio. The results are below.

      Thanks for testing !

      Yes, if you explicitly specify the :perlio (or on MacOS maybe better :stdio) layer the stream will be buffered. What I don't understand (yet ?) is why a simple :raw removes all layers, including the existing :perlio (or in your case :stdio) layer.

      But I have the feeling, that that behaviour may be intentional and that only the documentation is a bit misleading here.