in reply to can't get rid of BOM from UTF-8 webpage

The codepage doesn't have anything to do with it---either the shell strips the BOM internally or it doesn't. However if it implemented Unicode correctly it should render the BOM as an invisible characters. Of course a BOM is completely superfluous¹ in UTF-8 (Notepad BTW is notorious for writing one anyway) and I agree it could well be discarded upon reading. As it doesn't, just strip it out as suggested in the post above.

¹ OK, it <em<could serve to identify UTF-8 with just the first couple of bytes if it were consistently applied, but as it's not recommended by the standard, hardly anyone does it so identification of unknown files always has to rely on larger data chunks anyway.

  • Comment on Re: can't get rid of BOM from UTF-8 webpage