Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

binmode file read error message

by SamCG (Hermit)
on Jan 20, 2006 at 15:13 UTC ( [id://524492]=perlquestion: print w/replies, xml ) Need Help??

SamCG has asked for the wisdom of the Perl Monks concerning the following question:

While reading a UTF-16 encoded file (with "FIL" as the file handle), I get the error in the middle of reading the file:
UTF-16:Unrecognised BOM 2550 at H:\script\exceptions.pl line 64, <FIL> + line 127 7.
I did run a google search for "BOM 2550", and anticipate the FBI will be here soon. ;) I hope they know Perl.

I'm using simply
open FIL, $_ or die "could not open $_: $!\n"; binmode FIL, ":encoding(UTF-16)";
to open the file.

I do see the following in the documentation for binmode, but I don't quite grasp whether it bears directly on the issue I have.
Another consequence of using binmode() (on some systems) is that speci +al end-of-file markers will be seen as part of the data stream. For s +ystems from the Microsoft family this means that if your binary data +contains \cZ, the I/O subsystem will regard it as the end of the file +, unless you use binmode(). binmode() is not only important for readline() and print() operations, + but also when using read(), seek(), sysread(), syswrite() and tell() + (see the perlport manpage for more details). See the $/ and $\ varia +bles in the perlvar manpage for how to manually set your input and ou +tput line-termination sequences.
Anyone know what it means, or how I can fix it?

update: On Windows XP

Replies are listed 'Best First'.
Re: binmode file read error message
by BrowserUk (Patriarch) on Jan 20, 2006 at 15:52 UTC

    BOM stand for Byte Order Mark. It is a sequence of characters, usually 0xfe 0xff, which are used to indicate the byte order (little endian -v- big endian) of the data in a unicode file. However, I can't see how the 2550 value relates.

    See Byte Order Mark for some further info.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: binmode file read error message
by nobull (Friar) on Jan 20, 2006 at 18:48 UTC
    The first thing in a UTF-16 data stream (of unspecified endianness) must always be the byte order mark.

    Looking at the source of the Encode module the error "Unrecognised BOM" is produced when a Encode::Unicode object has no endian attribute (yet) and encounters anything other than a valid BOM.

    So the above error would happen if a file that you were trying to open as UTF16 was in fact ASCII (or UTF8) and started with the two characters "%P" or indeed was, say, a PDF file.

    If the Encode::Unicode object has a renewed attribute it will automatically update its own endian attribute upon seeing the initial BOM and can then subsequently process data without a BOM.

    The Encode::Unicode objects used by PerlIO are supposed to have this "renewed" attribute so you should never see this message except at the start of the file.

    Note that when Perl says "<FIL> line 127" in an error message this just means that the last file read operation was to read line 127 of FIL. It does not necessarily mean that is was the data that was read from that line that was actually being processed.

    In particular if the error is occuring in the process of reading the first line from a file it's possible that the error message would reflect the previous read operation. although I'm unable to reproduce this. I have found this happens if you reopen a filehandle.

    use strict; while (<DATA>) {}; # This warning appends "<DATA> line 3" warn "Something not related to the DATA"; open DATA, '<:encoding(UTF16)', 'simple.pdf' or die $!; # This gives error UTF-16:Unrecognised BOM 2550 ...,<DATA> line 3. <DATA>; __DATA__ xx xx xx

    Please try to produce a minimal but complete script and data file combination that can reproduce the error.

    The moral of the story: don't use Perl4-style bare filehandles.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://524492]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (7)
As of 2024-04-19 08:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found