in reply to regular expression (search and destroy)
As you have had all the obligatory warnings about not using a module, or at least copying from a module to do this, let me be the one to caution you that if you opt to use a module, look at them very carefully. They are not all equal.
The first thing to check for is that the modules idea of what constitutes CSV data, is the same as Excel's idea. For example, Excel can generate CSV data with quoted fields that contain embedded newlines. And don't blame MS for this extension to the standard (if you can find a standard definition for CSV), many other spreadsheets also do this going right back to the once ubiquitous Lotus 123 I believe. To date, Tillys Text::xSV is the only module I found that will handle this.
If you have large volumes of CSV to parse, many of the CSV modules around are less than sparkling in the performance department. The best performer I have found is Text::CSV_XS, but it fails to handle embedded newlines. In any case, if you cannot or will not install modules, being XS, it will not be useful to you.
It is possible to do this yourself with regexes, but it is quite difficult to get it right and cover all the edge cases.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: regular expression (search and destroy)
by giulienk (Curate) on Nov 13, 2003 at 07:32 UTC |