comment on

Actually, despite my previous comment about split / join, when I wrote my own module to handle CSV, I stepped back a generation and went for character-by-character parsing, and the use of state flags to track things like quote encapsulation and line completion.

The Text::CSV module which existed at the time was grossly underpowered and did not work on some of the simplest CSV files I'd exported from Excel. (I would be remiss to fail to point out that it has matured nicely since.)

I was, at the time, prohibited from working on open source projects (similar to Ben Tilly) without prefacing it by an administrative and legal process that usually took 3-6 months to complete.

I needed working code within a week, with the flexibility to add full functionality on a more relaxed schedule.

So my home-spun CSV module was born.

I coded it as precisely to the specifications I could find, drawing primarily from its Wiki page, and probably also RFC 4180. As is frequently noted and quoted around the Internet, the CSV standard is not perfectly well-defined -- although in my research it became clear that more of it was sufficiently-defined than that for which most give it credit.

I would also point out that, without surprise, Microsoft failed to adhere to one or two items that actually were in the CSV specification in its exports from Excel (whatever version we were using then), which required a few extra edge cases to be written into the module. I think at one point I was even down to considering a user-specified "Microsoft Flag" parameter to direct the parser to either follow the CSV standard or to use what worked with Microsoft Excel; not sure if I found an automated way to handle that corner case or not.

Anyway -- I have on two occasions run into CSV files it did not properly parse, and I have bugs registered in my change control system to address them someday. Alas, it is medium-low on my priority scheme, and has not seen any attention since June of 2008.

Plus, with Tux having written a brilliant alternative Text::CSV_XS module which, knowing Tux, probably was at least as picky about sticking to the specifications as I would have been, my motivation for fixing my own module is pretty low -- the next time I need to decode a CSV file and my own module doesn't handle it, I just might refactor to use Text::CSV_XS.

Anyway, if you could show the code you currently use to parse and/or build SCSVs, someone here might be able to find a quick way to flex it up so it can switch its delimiter without a lot of effort.

In reply to Re^3: Semicolon delimited to Comma delimited by marinersk
in thread Semicolon delimited to Comma delimited by swatzz

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.