This is a BOM for UTF-16 Big Endian-encoded files.
You are mistaken. It's the BOM, period. It can be encoded using UTF-8 and UTF-16le just as easily as with UTF-16be.
$ perl -MEncode -e'print encode("UTF-8", chr(0xFEFF))' | od -t x1
0000000 ef bb bf
0000003
$ perl -MEncode -e'print encode("UTF-16be", chr(0xFEFF))' | od -t x1
0000000 fe ff
0000002
$ perl -MEncode -e'print encode("UTF-16le", chr(0xFEFF))' | od -t x1
0000000 ff fe
0000002
| FEFF | BOM
|
|---|
| 2B,2F,76,38,2D | BOM encoded using UTF-7
|
|---|
| EF,BB,BF | BOM encoded using UTF-8
|
|---|
| FE,FF | BOM encoded using UTF-16be
|
|---|
| FF,FE | BOM encoded using UTF-16le
|
|---|
| 00,00,FE,FF | BOM encoded using UTF-32be
|
|---|
| FF,FE,00,00 | BOM encoded using UTF-32le
|
|---|
So you won't find FE,FF in a UTF-8 file, but just like in a UTF-16be file, you can find an encoded FEFF in a UTF-8 file.
| [reply] [d/l] |
I'm trying my best to understand this thread, but I'm having difficulty.
I'm dealing with the same issue where Notepad seems to add the BOM to the beginning of UTF-8 files. I've tried deleting it using all these commands, none of which works:
s/chr(0xEFBBBF)//; #remove Byte Order Mark
s/\x{EFBBBF}//;
s/^chr(0xFEFF)//;
s/^\x{FEFF}//;
Another clue: When I was using Strawberry Perl, I was able to use \x{064E} to refer to an Arabic vowel marker, and that worked. But now I'm using ActiveState, and that no longer works.
But I haven't been able to reference the BOM using either Strawberry or Active State. So I'm wondering if there's some sort of package I need to reference in order to make Perl recognize the \x{NNNN} format. Any suggestions?
Thanks,
| [reply] |
The last one is the correct one. It will remove the BOM after it's been decoded.
| [reply] |
I'm trying my best to understand this thread, but I'm having difficulty.
Please stop trying, there is nothing for you here, read Tutorials/perlunitut: Unicode in Perl, perlunitut, use via:File::BOM
I've tried deleting it using all these commands, none of which works:
Please stop that :) Read perlunitut, use via:File::BOM , it will decode your file and remove the BOM for you
If you've got raw data you want to share you can use
perl -MData::Dump -MFile::Slurp -e " dd scalar read_file shift, { qw/
+binmode :raw / }; " AnyKindOfInputFile > ThatFilesBytesAsPerlAsciiCo
+de.pl
The different ways BOM can look
$ perl -MFile::BOM -MData::Dump -e " dd \%File::BOM::enc2bom "
{
# tied Readonly::Hash
"iso-10646-1" => "\xFE\xFF",
"UCS-2" => "\xFE\xFF",
"UTF-16BE" => "\xFE\xFF",
"UTF-16LE" => "\xFF\xFE",
"UTF-32BE" => "\0\0\xFE\xFF",
"UTF-32LE" => "\xFF\xFE\0\0",
"UTF-8" => "\xEF\xBB\xBF",
"utf8" => "\xEF\xBB\xBF",
}
| [reply] [d/l] [select] |