http://qs1969.pair.com?node_id=1124430


in reply to Re^2: Semicolon delimited to Comma delimited
in thread Semicolon delimited to Comma delimited

Actually, despite my previous comment about split / join, when I wrote my own module to handle CSV, I stepped back a generation and went for character-by-character parsing, and the use of state flags to track things like quote encapsulation and line completion.

The Text::CSV module which existed at the time was grossly underpowered and did not work on some of the simplest CSV files I'd exported from Excel. (I would be remiss to fail to point out that it has matured nicely since.)

I was, at the time, prohibited from working on open source projects (similar to Ben Tilly) without prefacing it by an administrative and legal process that usually took 3-6 months to complete.

I needed working code within a week, with the flexibility to add full functionality on a more relaxed schedule.

So my home-spun CSV module was born.

I coded it as precisely to the specifications I could find, drawing primarily from its Wiki page, and probably also RFC 4180. As is frequently noted and quoted around the Internet, the CSV standard is not perfectly well-defined -- although in my research it became clear that more of it was sufficiently-defined than that for which most give it credit.

I would also point out that, without surprise, Microsoft failed to adhere to one or two items that actually were in the CSV specification in its exports from Excel (whatever version we were using then), which required a few extra edge cases to be written into the module. I think at one point I was even down to considering a user-specified "Microsoft Flag" parameter to direct the parser to either follow the CSV standard or to use what worked with Microsoft Excel; not sure if I found an automated way to handle that corner case or not.

Anyway -- I have on two occasions run into CSV files it did not properly parse, and I have bugs registered in my change control system to address them someday. Alas, it is medium-low on my priority scheme, and has not seen any attention since June of 2008.

Plus, with Tux having written a brilliant alternative Text::CSV_XS module which, knowing Tux, probably was at least as picky about sticking to the specifications as I would have been, my motivation for fixing my own module is pretty low -- the next time I need to decode a CSV file and my own module doesn't handle it, I just might refactor to use Text::CSV_XS.

Anyway, if you could show the code you currently use to parse and/or build SCSVs, someone here might be able to find a quick way to flex it up so it can switch its delimiter without a lot of effort.