Binmode looks promising besides using -C, but both have the disadvantage of hardcoding the machine/platform into the script.

Actually, binmode is definitely the preferred method, as well as 3-arg open on file handles. There are some problems with -C, and this option is likely to get phased out in the future.

You can easily make the encoding a configurable parameter, to be set just once and used consistently throughout the app. Depending on how you've written the app so far, you might just need to convert your "open" statements to use the 3-arg format:

# during intialization: $encoding = "utf8"; # or "encoding(cp1252)" or whatever binmode STDOUT, $encoding; # (if this is appropriate) binmode STDIN, $encoding; # ... # then make all open statements look like this: # open( INHANDLE, "<$encoding", $ifilename ) # open( OUTHANDLE, ">$encoding", $ofilename )
(update: added a second open() example to make a point: this way, perl will always be dealing with unicode character strings, so that "." always matches one character, "uc" does the right thing, etc.)

There's also the "use open" pragma, although I can't seem to get it to work for output file handles. (Works great for setting encoding mode on input -- esp. if you use the magical ARGV file handle.)

But I see your point with binary files.

Yes, there really were a lot of people (esp. on Red Hat systems with Perl 5.8.0, as it turned out), with a lot of perl scripts that handled binary data and assumed the "text/binary" file-mode distinction was not an issue for them ("just open the file..."). And then suddenly, when a file handle's encoding mode was set by default to be consistent with the user's locale (which by default was utf8), all hell broke loose.

That sort of default behavior has been discontinued (corrected), and those people with those old scripts are still out there, blissfully ignoring how some other people would like utf8 to be the default file mode. These are hard times for setting up default behaviors...


In reply to Re^4: bug in utf8 handling? by graff
in thread bug in utf8 handling? by jethro

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.