in reply to Finding CR-LFs within quoted CSV fields

You are wrong about DBD::CSV. It fully supports newlines embedded in quoted fields. It does this by default, no special actions are required on your part. (I am its maintainer, I should know). If you weren't able to get it to work, I suspect that your data is using different line endings than your code. Please show your data and your code.
  • Comment on Re: Finding CR-LFs within quoted CSV fields

Replies are listed 'Best First'.
DBD::CSV and embedded CR-LFs
by Wally Hartshorn (Hermit) on Jun 11, 2004 at 21:15 UTC

    Hmm... That's odd. This is an existing Perl program that is using DBD::CSV and was working fine for months until last week. The user brought up the web-based upload form, uploaded their CSV file (exported from FoxPro), and got 0 records imported. We looked at the file they were uploading and discovered there were a few records that had embedded CR-LFs. Adding a regex to pre-process the uploaded file and remove the embedded CR-LFs solved the problem and allowed DBD::CSV to process the file.

    If DBD::CSV can already handle that on its own, then I'm mystified as to why our regex would solve the problem. It looks like we'll need to investigate further. We'll double-check things on Monday to make sure we got the results we thought and will get back you. Thanks!

    Wally Hartshorn

      If DBD::CSV can already handle that on its own, then I'm mystified as to why our regex would solve the problem
      Me too. Since diotalevi seems to have had the same problem, it's possible there's a bug somewhere. I'd really appreciate help in tracking it down. Could you and diotalevi both please let me know which versions of SQL::Statement and DBD::File you're using, it's more likely to be in them.

        Ah HA! (*smack* -> head) I've figured out the problem, and it has nothing to do with DBD::CSV.

        The CSV file that we're processing is exported in an unusual manner. Embedded quotes aren't escaped in any way. We have a routine that pre-processes the CSV to find those embedded quotes and escapes them (such that a " becomes a "" pair). The processed CSV file is then handed off to DBD::CSV for normal processing.

        Recently, the user added a field to the CSV file, which promptly broke our program. In the process of trying to figure out the problem (because, of course, they didn't tell us they had added a field, only that the program had stopped working), we discovered that CR-LFs were embedded in some fields. We then leaped to the (incorrect) conclusion that this was a recent occurence and the cause of our problems.

        We later learned about the addition of the new field. However, it turned out that the new field also triggered a bug in our pre-processing code, but we didn't know this at the time.

        In trying to fix the supposed problem with the embedded URLs, we had replaced the buggy pre-processing code. So, when we ran the program with the (correct) CR-LF replacement code, it worked. When we reverted to the (buggy) old pre-processing code, it stopped working. That led us to the incorrect conclusion that the CR-LF replacement code was needed.

        So, in summary, DBD::CSV handles embedded CR-LFs fine. Mystery solved! (Of course, if there's a nifty setting to handle unescaped quotes, I'd be glad to learn of it!)

        Wally Hartshorn

Re^2: Finding CR-LFs within quoted CSV fields
by diotalevi (Canon) on Jun 11, 2004 at 21:17 UTC
    That is odd. I stopped using this module for exactly this reason as well. I will try to get back to you about what your module was dying on.
      Thanks, I appreciate the help. It's most likely in SQL::S or DBD::F rather than DBD::CSV itself.

      update diotalevi msg'd me that it was a different module he had been thinking of, that he actually doesn't use DBD::CSV.