Your skill will accomplish what the force of many cannot |
|
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
I guess your input file contains non-utf8 characters then. I think I was in a similar situation when I was mailed some utf8-encoded text documents and for some reason I don't know, there was a non-utf8 char at the very beginning. I guess the easiest way to go is to call getc(IN) just before the loop. This would assume though that there indeed is an invalid character there - adding some tests on the return value of getc may be necessary if you're not sure. ...I could also be totally wrong and your input file is OK and the problem is somewhere else. update: FEFF is the unicode character code of the BOM (Byte Order Mark).I said wrongly that there is a non-UTF8 character (meaning non-utf8 byte sequence) and I was wrong. Of course, according to the error message, the unicode character only has no equivalent in cp-1256. Thanks almut for a proper explanation. In reply to Re^3: Conversion from UTF-8 to windows-1256 encoding
by Sixtease
|
|