Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
The first thing in a UTF-16 data stream (of unspecified endianness) must always be the byte order mark.

Looking at the source of the Encode module the error "Unrecognised BOM" is produced when a Encode::Unicode object has no endian attribute (yet) and encounters anything other than a valid BOM.

So the above error would happen if a file that you were trying to open as UTF16 was in fact ASCII (or UTF8) and started with the two characters "%P" or indeed was, say, a PDF file.

If the Encode::Unicode object has a renewed attribute it will automatically update its own endian attribute upon seeing the initial BOM and can then subsequently process data without a BOM.

The Encode::Unicode objects used by PerlIO are supposed to have this "renewed" attribute so you should never see this message except at the start of the file.

Note that when Perl says "<FIL> line 127" in an error message this just means that the last file read operation was to read line 127 of FIL. It does not necessarily mean that is was the data that was read from that line that was actually being processed.

In particular if the error is occuring in the process of reading the first line from a file it's possible that the error message would reflect the previous read operation. although I'm unable to reproduce this. I have found this happens if you reopen a filehandle.

use strict; while (<DATA>) {}; # This warning appends "<DATA> line 3" warn "Something not related to the DATA"; open DATA, '<:encoding(UTF16)', 'simple.pdf' or die $!; # This gives error UTF-16:Unrecognised BOM 2550 ...,<DATA> line 3. <DATA>; __DATA__ xx xx xx

Please try to produce a minimal but complete script and data file combination that can reproduce the error.

The moral of the story: don't use Perl4-style bare filehandles.


In reply to Re: binmode file read error message by nobull
in thread binmode file read error message by SamCG

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-03-28 14:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found