Re: UTF-8 text files with Byte Order Mark

so I kinda assume that Perl will handle with this kind of stuff for me.

Having Perl remove the BOM automatically would be bad. print while <$fh>; would no longer print out a file exactly, for example. It wouldn't be possible to print out a file exactly by other means either.

However, if file contains that BOM, my program does not understand the first line in the file

Patient: "Doctor, it hurts when I do this."
Doctor: "So don't do it!"

If your program doesn't accept BOMs, don't feed it any. BOMs are not required.

Alternatively, you could change your spec and your program to accept it.

while (<$fh>) {
   s/\x{FEFF}//g;
   ...
}
[download]

Comment on Re: UTF-8 text files with Byte Order Mark Select or Download Code

Replies are listed 'Best First'.
Re^2: UTF-8 text files with Byte Order Mark by muba (Priest) on Feb 13, 2007 at 20:05 UTC
Patient: "Doctor, it hurts when I do this." Doctor: "So don't do it!" Easy to say, of course, but what if the program one of my users uses stores that BOM anyway? Besides, as pointed out, a BOM in a utf-8 file are valid so I feel I should support it. Look, if the user was toying around with malformed files I'd be more than happy to tell him to get that fixed :D but apparently he's doing what he righteously thinks is righs.	[reply]
Re^3: UTF-8 text files with Byte Order Mark by ikegami (Patriarch) on Feb 13, 2007 at 20:36 UTC
a BOM in a utf-8 file are* valid* "`!`" in an ASCII file is also valid. But if you place a "`!`" at the start of your Perl program, it probably will not compile. It is a malformed file, not from a UNICODE perspective, but from your parser's perspective. I provided two alternatives (removing the BOM and File::BOM) that will work with your broken tools (i.e. tools that add undesirable character to the files you edit). I'd go with them since allowing the BOM is surely a good thing.	[reply] [d/l] [select]
Re^4: UTF-8 text files with Byte Order Mark by muba (Priest) on Feb 13, 2007 at 20:43 UTC
Ouch. I'm afraid I used the wrong tone in my previous reply. You see, I am now removing that BOM myself (as you can read below). I never meant to attack or critisize you. In fact, I much appreciate your input!	[reply]
Re^2: UTF-8 text files with Byte Order Mark by Anonymous Monk on Jul 24, 2019 at 20:56 UTC
"If your program doesn't accept BOMs, don't feed it any. BOMs are not required. " This is a mindbogglingly stupid statement that ignores or even stands on its head the Robustness principle. Anyone who writes something so inane and so dangerous should be barred for life from software development.	[reply]