I tend to write a lot of filtering scripts that read either from STDIN or from one or more files named in @ARGV -- I just start them with a  while(<>) loop, and I like the flexibility of using them two different ways:
myscript some.data > output # or grep -h foo *.data | sort -u | myscript > output
But what if I need to apply  binmode() on the input file handle (e.g. because the data needs to be read as utf8)? For stdin-stdout usage, that's no problem -- just:
binmode STDIN, ":utf8";
But what about for files in @ARGV? I know these get opened via the "magical" ARGV file handle, and I know (from having just tried it) that this does not DWIM:
#!/usr/bin/perl binmode STDIN, ":utf8"; # covers pipe input binmode ARGV, ":utf8"; # does not work -- handle isn't open yet while (<>) { do_whatever( $_ ) }

I've tried using the '-C' option on the shebang line, but it turns out that -C has problems when there are other option flags on the shebang line. In fact there's a thread at perlbug (ticket #34087, for those keeping score) that indicates this flag is known to be broken and apparently will be phased out. (I first realized the problems when a script using -C, which worked in 5.8.7, failed to work in 5.8.8.)

Note that  use encoding "utf8"; only affects STDIN and STDOUT -- no effect on ARGV. I know I can do something like this:

#!/usr/bin/perl use strict; my @files; if ( @ARGV ) { @files = @ARGV; } elsif ( -t ) { die "I want file names to open, or else pipeline input"; } else { @files = "stdin"; } for my $file ( @files ) { my $fh; if ( @ARGV ) { open $fh, "<:utf8", $file; } else { binmode STDIN, ":utf8"; $fh = \*STDIN; } while (<$fh>) { do_whatever( $_ ) } }
But that sucks. I could also give up the convenience of "dual usage" -- e.g. just write scripts to read from STDIN only, and never use the "magical" ARGV file handle -- but that would be sad. Using environment or locale settings would be fairly impractical as well (consecutive command lines might need to use different encodings).

Can someone point out a better way to do this? Or maybe the powers that be could be talked into fixing and keeping the -C option? (I suppose this will be a non-issue when Perl 6 becomes the tool of choice...)

(update: added declaration for $fh in last code snippet, to make it grammatical)


In reply to Using binmode on ARGV filehandle? by graff

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.