emgrasso has asked for the wisdom of the Perl Monks concerning the following question:

I have some CSV data to process that includes multiline fields, some of which begin with CRs. In general, I need to be able to identify two or three kinds of lines in the data from my input file:
lines ending in " without a preceding ,
(possibly) lines ending in ","
lines ending in anything else.
My regex skills for dealing with punctuation at ends of lines seem to be a bit rusty. I'd appreciate any suggestions.

Replies are listed 'Best First'.
Re: Regex help for CSV Multiline handling
by dragonchild (Archbishop) on Feb 25, 2005 at 00:20 UTC
    Use Text::xSV - it was designed for dealing with this kind of situation.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Regex help for CSV Multiline handling
by jZed (Prior) on Feb 25, 2005 at 00:46 UTC
    Use Text::CSV_XS, with binary=>1 and it will handle your CSV much faster and cleaner than any regex you can come up with.
Re: Regex help for CSV Multiline handling
by perlfan (Parson) on Feb 24, 2005 at 23:00 UTC
    My regex skills are not so polished either, but let me know how I do:
    1: m/^([.]*)[^,]"\s*$/ 2: m/^([.]*),\s$/ 3: m/^(.*)$/
    Note: for #1, I am not sure if by "without a preceding" you be no commas or just not one before "

    I suggest not creating one regex to "rule them all"; instead check for each line in the order of precedence that you want; for example, #1 is probably your 'catch-all'.

    Your question is actually very general, so if you are looking for more specific help in doing something, you need to get more detailed.