in reply to Using ":raw" layer in open() vs. calling binmode()

First of all, thank you very much for all your responses !

As jbert pointed out in his reply this seems to be either a bug in PerlIO itself or an erroneous (or at least misleading) documentation.

So my conclusion is:

To open() a file containing binary data in a portable manner without loosing buffering use the sequence

open(my $fh, '<', $filename); binmode($fh);

and avoid

open(my $fh, '<:raw', $filename)

how tempting it may look at first.


Some additional comments to my original post:

The "short version" of my question should have probably better been written as:

How do I portably and correctly open() a binary file without loosing buffering ?

I think the fact that the stream is unbuffered when specifying (only) the ":raw" layer in open() is quite important.

Also: My concern was not which layers are actually resulting from these two methods, but rather that they give different results on the same machine. I should have made that clearer.—Sorry !

Thanks to all your tests we know now that the behavior (be it a bug or not) is at least consistent across platforms and (some) Perl versions.

I'm also aware that UNIX treats binary data natively, so there is no need to call binmode() at all. So if you will my question was rather academic than having a real coding problem. (I hope that I didn't misuse this forum, though.)

Finally, my apologies for my possibly bad (or weird ?) English—it's not my native language.

Replies are listed 'Best First'.
Re^2: Using ":raw" layer in open() vs. calling binmode()
by Anonymous Monk on Jun 15, 2007 at 09:37 UTC
    Depending on what you want to do with your 'raw' filehandle, then using:

    open(my $fh, '<', $filename); binmode($fh);
    ...may not be appropriate (at least on windows).

    If you want to read raw bytes from a filehandle (for example to look for a BOM) and _then_ add an encoding layer then you need to ensure that there isn't a :crlf layer hanging around to begin with which the above has (it may not be enabled, but it's still there *). This is because of :crlf not working nicely with encodings eg. it breaks tell().

    So, the method i use is:

    open(my $fh, "<:raw:perlio", $file); # read first few bytes for a BOM or ASCII in utf(16|32) # to determine encoding binmode $fh, ":encoding($encoding)"
    * Consider:
    open $fh, "<", $^X or die; binmode $fh; print "1: ", join(' ', PerlIO::get_layers($fh, details => 1)), "\n"; close $fh; open $fh, "<:raw:perlio", $^X or die; print "2: ", join(' ', PerlIO::get_layers($fh, details => 1)), "\n"; close $fh;
    gives:

    1: unix 4195328 crlf 4195328 2: unix 4195328 perlio 4195328

      Thanks for your advice !

      It's interesting that binmode() only disables the :crlf layer but doesn't remove it. (Just discovered, that's actually documented in PerlIO.)

      Does it still interfere with tell() even when it's disabled ? (perlport says: "If you use binmode on a file, however, you can usually seek and tell with arbitrary values in safety.")

      I'm just curious, what happens if you try these on Windows:

      open $fh, "<", $^X or die; binmode $fh, ":raw"; # perldoc says: identical to 'binmode $fh' print "3: ", join(' ', PerlIO::get_layers($fh, details => 1)), "\n"; close $fh; open $fh, "<", $^X or die; binmode $fh, ":raw:perlio"; # seems not useful (see below) print "4: ", join(' ', PerlIO::get_layers($fh, details => 1)), "\n"; close $fh;

      On UNIX that gives:

      3: unix 4195328 perlio 4195328 4: unix 4195328 perlio 4195328 perlio 4195328

      I know this discussion should probably go into a Perl develompent group but let me just explain where I'm having problems to understand things here:

      If one has to use open($fh, '<:raw:perlio', $file) to get a clean and buffered binary stream this unconditionally forces a :perlio layer onto the stack, which in turn inserts a :unix layer below it.

      Now, what about MacOS where Perl uses :stdio by default ? (I know far too little about MacOS but it must be faster than :unix:perlio there.) Also if someone wants to change the default layers via the PERLIO environment variable (for whatever reason), it would be blithely ignored on those streams.

        > Does it (a :crlf layer) still interfere with tell() even when it's disabled ?

        No. But if you later add (binmode) a :crlf layer then it can enable it at the wrong 'level'. This is a common mistake when using utf(16|32) eg.

        # wrong way to write utf16 open $fout, ">", $file or die; binmode $fout, ":encoding(utf16le)" or die; print $fout "abc\n123\n\n"; close $fout;
        this results in CRLF being output as "0D 0A 00" bytes. Oops. What you need is:
        # OK open $fout, ">:raw:encoding(utf16le):crlf", $file or die; # etc.

        Bascially there are known bugs with mixing :encoding and :crlf layers and suggest that if you need both that you test well to ensure you're not hitting them. (Also check the Perl bug database).

        Your code gave for me on windows:

        3: unix 4195328 crlf 4195328 4: unix 4195328 crlf 4195328 perlio 4195328

        > If one has to use open($fh, '<:raw:perlio', $file) to get a clean and buffered binary stream this unconditionally forces a :perlio layer onto the stack, which in turn inserts a :unix layer below it.

        NB. my original point was about creating a raw byte stream that you *then* wanted to add :encoding layers to.

        NB. the reason for the :perlio in  open $fin, "<:raw:perlio" is to give the filehandle the ability to be -B/-T -able. Consider:

        C:\>perl -we "open $fin, '<:raw', $^X or die; die if -B $fin -T and -B not implemented on filehandles at -e line 1. C:\>perl -we "open $fin, '<:raw:perlio', $^X or die; die if -B $fin Died at -e line 1.

        If all you want *is* a binary, then either of:

        open $fin1, "<", $foo or die; binmode $fin1; open $fin2, "<:raw", $bar or die;
        are equivalent (and you can forget all my ramblings :-)