As you have had all the obligatory warnings about not using a module, or at least copying from a module to do this, let me be the one to caution you that if you opt to use a module, look at them very carefully. They are not all equal.

The first thing to check for is that the modules idea of what constitutes CSV data, is the same as Excel's idea. For example, Excel can generate CSV data with quoted fields that contain embedded newlines. And don't blame MS for this extension to the standard (if you can find a standard definition for CSV), many other spreadsheets also do this going right back to the once ubiquitous Lotus 123 I believe. To date, Tillys Text::xSV is the only module I found that will handle this.

If you have large volumes of CSV to parse, many of the CSV modules around are less than sparkling in the performance department. The best performer I have found is Text::CSV_XS, but it fails to handle embedded newlines. In any case, if you cannot or will not install modules, being XS, it will not be useful to you.

It is possible to do this yourself with regexes, but it is quite difficult to get it right and cover all the edge cases.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
Wanted!


In reply to Re: regular expression (search and destroy) by BrowserUk
in thread regular expression (search and destroy) by data67

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.