in reply to Re: Getting Data from an Excel File
in thread Getting Data from an Excel File

++ Excellent reply.

When you convert it as shown above, however, the output file will have a BOM (presumably because it's then converted just like any other codepoint), which is recommended on Windows.

Recommended by whom? Microsoft Corp.?

I don't like BOMs in UTF-8 files on any platform. A BOM in a text file that is otherwise all ASCII kills its backward compatibility with so-called "legacy" software, which is a big part of the raison d'être of the UTF-8 encoding form. In my experience, most modern applications that understand Unicode will figure out the UTF-8-ness of a BOM-less text file, whereas almost no legacy software will tolerate a BOM in an ASCII file.

See this entry and following ones in the Unicode UTF/BOM FAQ.

Jim

Replies are listed 'Best First'.
Re^3: Getting Data from an Excel File
by almut (Canon) on Feb 27, 2008 at 21:54 UTC
    Recommended by whom? Microsoft Corp.?

    Not sure what Microsoft's official recommendation is in this regard (if anyone knows, please share). My "is recommended" statement is just my resumé from personal experience, in particular from having worked in Japanese Windows environments for a couple of months.

    My impression there was that overall you'll run into the least problems if you always tag unicode files as such using a BOM (be they UTF-8, UTF-16 or UCS-2). Some programs will try auto-detection (with varying success), but many simply assume the file is in the default legacy encoding, if not told otherwise.  YMMV of course, depending on which applications you're primarily working with. So please take this with a grain of salt.

    I don't like BOMs in UTF-8 files on any platform...

    I personally don't like them either, in particular on Unix platforms, where they tend to create more problems than they solve. OTOH, I've gotten used to the situation that different platforms have different approaches and philosophies.  After all, with Perl in my handbag, this isn't too much of an issue anyway...