My apologies if this is an FAQ, but I've been unable to find an answer.

I have a comma-separated values (CSV) file exported from a FoxPro database. The text fields within the CSV file are enclosed within double-quotes.

Unfortunately, some of the fields contain embedded CR-LF characters. The DBD::CSV module interprets those CR-LF characters as end-of-record markers. I haven't found any way to tell DBD::CSV that a CR-LF pair within a quoted field is not an end-of-record marker, so I'm using a regex to convert the CR-LF pairs into HTML <br> tags. (The text will be displayed in a browser, so that's what I want anyway.)

The code I've developed to do this is ugly and not bulletproof. It assumes that a quote, followed by a CR-LF, followed by a quote should be treated as the closing quote of a the last field of one record, followed by an end-of-record, followed by the opening quote of the first field of the next record. This is not always true.

{ undef $/; $slurp = <$fh>; # slurp up the file # append a bogus final quote to the end of the file $slurp .= '"'; # replace the end-of-record CR-LFs $slurp =~ s/"\r\n"/"__EOR__"/g; # replace the other CR-LFs with <br> tags $slurp =~ s/\r\n/<br>/g; # restore the end-of-record CR-LFs $result =~ s/"__EOR__"/"\r\n"/g; # remove the bogus final quote $result =~ s/"$//; }

The correct way to do this would be to loop through the file, counting opening quotes and closing quotes, replacing any CR-LFs within an opening quote/closing quote pair with HTML <br> tags.

Doing this as a while() loop seems awkward. Is there some elegant regex that would handle this?

Wally Hartshorn


In reply to Finding CR-LFs within quoted CSV fields by Wally Hartshorn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.