in reply to Parsing OpenOffice Spreadsheets

Unfortunately, Text::CSV_XS chokes on both 8859-1 and utf8 text.
Did you set binary => 1, which is required to use extended character sets? From the docs: "Allowable characters within a CSV field include 0x09 (tab) and the inclusive range of 0x20 (space) through 0x7E (tilde). In binary mode all characters are accepted, at least in quoted fields"

Yes, I know "binary" does not mean what one thinks it means here (inconceivable!) but the docs are pretty clear on this point otherwise.

Replies are listed 'Best First'.
Re^2: Parsing OpenOffice Spreadsheets
by Nomad (Pilgrim) on Nov 17, 2005 at 15:58 UTC
    Did you set binary => 1, which is required to use extended character sets?

    No, I didn't. Partly because I didn't understand that 'binary' in Text::CSV_XS meant non-ascii characters, but then I didn't look into CSV as deeply as perhaps I might have because I thought I'd better get to grips with the XML and use a pure UTF-8 solution.

    The beauty of openoffice is that it uses an open document format - I thought it was about high-time I got my head around it and made some way to understanding it. The next task is, of course, to do the same thing with gnumeric, but last time I looked the document format wasn't as well documented.

      Yep, I agree with your choice. Text::CSV_XS is not the right tool for your job. I'm just correcting the FUD in case someone is thinking of using it for a different task :-).