Ah HA! (*smack* -> head) I've figured out the problem, and it has nothing to do with DBD::CSV.

The CSV file that we're processing is exported in an unusual manner. Embedded quotes aren't escaped in any way. We have a routine that pre-processes the CSV to find those embedded quotes and escapes them (such that a " becomes a "" pair). The processed CSV file is then handed off to DBD::CSV for normal processing.

Recently, the user added a field to the CSV file, which promptly broke our program. In the process of trying to figure out the problem (because, of course, they didn't tell us they had added a field, only that the program had stopped working), we discovered that CR-LFs were embedded in some fields. We then leaped to the (incorrect) conclusion that this was a recent occurence and the cause of our problems.

We later learned about the addition of the new field. However, it turned out that the new field also triggered a bug in our pre-processing code, but we didn't know this at the time.

In trying to fix the supposed problem with the embedded URLs, we had replaced the buggy pre-processing code. So, when we ran the program with the (correct) CR-LF replacement code, it worked. When we reverted to the (buggy) old pre-processing code, it stopped working. That led us to the incorrect conclusion that the CR-LF replacement code was needed.

So, in summary, DBD::CSV handles embedded CR-LFs fine. Mystery solved! (Of course, if there's a nifty setting to handle unescaped quotes, I'd be glad to learn of it!)

Wally Hartshorn


In reply to DBD::CSV handles embedded CR-LFs fine! by Wally Hartshorn
in thread Finding CR-LFs within quoted CSV fields by Wally Hartshorn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.