BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way of persuading Perl to use binmode on files it opens via @ARGV?

In particular, when using Juerd's Cheap idioms?


Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Replies are listed 'Best First'.
Re: binmode for @ARGV
by Aristotle (Chancellor) on Jan 26, 2003 at 01:05 UTC
    Try binmode ARGV; - ARGV is the magical filehandle associated with @ARGV. This association is also the reason Juerd's snippet won't work reliably as is.

    Makeshifts last the longest.

      Unfortunately, applying binmode to ARGV before the file is opened, doesn't persist once perl uses @ARGV to open the file:(. Presumably reset during the open.

      I guess that renders Juerd's idiom unusable for binary files unless their is some way of interjecting between the open and the read done by <>;


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

        I guess that renders Juerd's idiom unusable for binary files. . .

        Only on already unusable platforms.

        sauoq ducks.

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: binmode for @ARGV
by Thelonius (Priest) on Jan 26, 2003 at 17:32 UTC
    I asked this a while ago. Unfortunately, there doesn't seem to be a way, but instead of while (<>), you can code the more complicated:
    unshift(@ARGV, '-') unless @ARGV; while ($ARGV = shift) { open(FRED, $ARGV); binmode(FRED); while (<FRED>) { ... # code for each line } }
    For those of you smug about always being on Unix, it is now recommended to use binmode even on Unix to avoid unwanted Unicode semantics.
      For those of you smug about always being on Unix, it is now recommended to use binmode even on Unix to avoid unwanted Unicode semantics.
      1. Mac OS is sane too so you should address smug users from both camps.

      2. Saying "it is recommended" doesn't mean much without the source of the recommendation.

        Though the ability to set layers for filehandles has been grafted onto binmode(), they can also be set via the open pragma and overridden with the three argument form of open(). If I were making my own recommendation, I'd suggest these over binmode() as being clearer and more natural.

      3. You're oversimplifying.

        IO layers are, to be fair, a separate issue from that which started this discussion. The use of layers is a powerful abstraction which provides a lot of functionality over and above the ability to fix braindead line terminations and EOF markers.

        Furthermore, for the most part, perl tries pretty hard to do the Right Thing™ (on sane platforms) regardless of whether layers are specified. Backward compatibility is, as always, a high priority so the default behavior is for perl to behave as it always has. My understanding is that the exception occurs when one has specified their preference for Unicode semantics via one of the locale environment variables, in which case, presumably, you know what you are doing.

      To the best of my knowledge, using binmode on binary files is recommended regardless of platform but only for portability reasons. In my case, most of what I write is specific to unix anyway because it supports systems that are themselves tightly coupled to unix. Usually, my only portability concern is that my code will build and run unaltered on Irix, Solaris, various BSDs, and Linux. Unsurprisingly, I regularly ignore binmode() as being wholly superfluous and I'll continue to do so until I hear a good reason not to.

      -sauoq
      "My two cents aren't worth a dime.";
      
Re: binmode for @ARGV (open.pm)
by tye (Sage) on Jan 30, 2003 at 21:13 UTC

    I checked back to see if anyone had yet looked up how to do this. Since noone had, I hit binmode which pointed me to perldoc open which suggests:

    use open IN=>":raw", OUT=>":raw";
    but it doesn't work because the calls to open() that are done by <> don't occur in the lexical context of the use open and so are not affected by it? That sucks.

    This problem (binmode with <>) has been known for years and I recall solutions being discussed years ago so I was quite disappointed to not find the solution. Perhaps open.pm was the solution and then it was "fixed" to be lexically scoped which also made it useless for its original purpose? (probably not)

    Even

    *CORE::GLOBAL::open= sub { my( $fh, $arg, @args )= @_; my $ret= CORE::open( $fh, $arg, @args ); binmode( $fh ); return $ret; };
    doesn't work because <> calls some C code directly rather than acting like call to open from Perl code. It calls Perl_do_open() which calls Perl_do_open9() which does check the flags set by open.pm. But it only checks the flags if it was called via an OP_OPEN opcode.

    So changing

    if (PL_op && PL_op->op_type == OP_OPEN) { /* set up disciplines */ U8 flags = PL_op->op_private; in_raw = (flags & OPpOPEN_IN_RAW);
    to start with just
    if (PL_op) {
    might solve the problem?

    I guess I'll download modern Perl source and send such a patch to p5p and see how they take it.

                    - tye

      Thankyou for looking into it Tye. Some of that is still over my head (what's new:), but it's interesting to see that this is one of those cases where simplifying the code may improve it's flexibility. Good luck with the patch. I have encountered several situations in which this 'fix' would have simplified my code when writing filters for binary files.

      If I ever work out how to get MinGW to build the sources, I guess I could add the patch to my own copy. First things first though:)


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      Does the PERLIO environment variables (described in perlrun) help in this case?

      In Perl 5.6 and some books the :raw layer (previously sometimes also referred to as a "discipline") is documented as the inverse of the :crlf layer. That is no longer the case - other layers which would alter binary nature of the stream are also disabled. If you want UNIX line endings on a platform that normally does CRLF translation, but still want UTF-8 or encoding defaults the appropriate thing to do is to add :perlio to PERLIO environment variable.

      Running "set PERLIO=:perlio" seemed to work under XP

      (Reponse well after original post)