in reply to Re^3: bug in utf8 handling?
in thread bug in utf8 handling?
Actually, binmode is definitely the preferred method, as well as 3-arg open on file handles. There are some problems with -C, and this option is likely to get phased out in the future.
You can easily make the encoding a configurable parameter, to be set just once and used consistently throughout the app. Depending on how you've written the app so far, you might just need to convert your "open" statements to use the 3-arg format:
(update: added a second open() example to make a point: this way, perl will always be dealing with unicode character strings, so that "." always matches one character, "uc" does the right thing, etc.)# during intialization: $encoding = "utf8"; # or "encoding(cp1252)" or whatever binmode STDOUT, $encoding; # (if this is appropriate) binmode STDIN, $encoding; # ... # then make all open statements look like this: # open( INHANDLE, "<$encoding", $ifilename ) # open( OUTHANDLE, ">$encoding", $ofilename )
There's also the "use open" pragma, although I can't seem to get it to work for output file handles. (Works great for setting encoding mode on input -- esp. if you use the magical ARGV file handle.)
But I see your point with binary files.
Yes, there really were a lot of people (esp. on Red Hat systems with Perl 5.8.0, as it turned out), with a lot of perl scripts that handled binary data and assumed the "text/binary" file-mode distinction was not an issue for them ("just open the file..."). And then suddenly, when a file handle's encoding mode was set by default to be consistent with the user's locale (which by default was utf8), all hell broke loose.
That sort of default behavior has been discontinued (corrected), and those people with those old scripts are still out there, blissfully ignoring how some other people would like utf8 to be the default file mode. These are hard times for setting up default behaviors...
|
|---|