in reply to Re^4: Unicode and text files
in thread Unicode and text files
is there any way to automagically determine what encoding a file is?
That's precisely what the BOM ("byte order mark") is for. If, when creating files, you don't specify a byte order, Perl will create a BOM for you (otherwise, the file will be "BOM-less"). Files created that way (without explicit byte order) can be read by using plain :encoding(utf16):
$ /usr/bin/perl use strict; use warnings; my $c = 'a'; my $fd; open $fd, '>:encoding(utf16le)', 'foo-le' or die "open: $!"; print $fd $c; close $fd; open $fd, '>:encoding(utf16be)', 'foo-be' or die "open: $!"; print $fd $c; close $fd; open $fd, '>:encoding(utf16)', 'foo' or die "open: $!"; print $fd $c; close $fd; __END__ $ xxd foo-le 0000000: 6100 a. $ xxd foo-be 0000000: 0061 .a $ xxd foo 0000000: feff 0061 ...a $ /usr/bin/perl open my $fd, '<:encoding(utf16)', 'foo' or die "open: $!"; print while <$fd>; close $fd; __END__ a
Update: Of course, I realized after clicking in "Create" that I really didn't answer your actual question :^). Well, if files don't have a BOM, you can only guess or brute-force them. Or add a BOM to them ;^).
.--
David Serrano
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Unicode and text files
by dirtdart (Beadle) on Oct 12, 2006 at 19:53 UTC |