Hmm... That's odd. This is an existing Perl program that is using DBD::CSV and was working fine for months until last week. The user brought up the web-based upload form, uploaded their CSV file (exported from FoxPro), and got 0 records imported. We looked at the file they were uploading and discovered there were a few records that had embedded CR-LFs. Adding a regex to pre-process the uploaded file and remove the embedded CR-LFs solved the problem and allowed DBD::CSV to process the file.
If DBD::CSV can already handle that on its own, then I'm mystified as to why our regex would solve the problem. It looks like we'll need to investigate further. We'll double-check things on Monday to make sure we got the results we thought and will get back you. Thanks!
| [reply] |
If DBD::CSV can already handle that on its own, then I'm mystified as to why our regex would solve the problem
Me too. Since diotalevi seems to have had the same problem, it's possible there's a bug somewhere. I'd really appreciate help in tracking it down. Could you and diotalevi both please let me know which versions of SQL::Statement and DBD::File you're using, it's more likely to be in them.
| [reply] |
Ah HA! (*smack* -> head) I've figured out the problem, and it has nothing to do with DBD::CSV.
The CSV file that we're processing is exported in an unusual manner. Embedded quotes aren't escaped in any way. We have a routine that pre-processes the CSV to find those embedded quotes and escapes them (such that a " becomes a "" pair). The processed CSV file is then handed off to DBD::CSV for normal processing.
Recently, the user added a field to the CSV file, which promptly broke our program. In the process of trying to figure out the problem (because, of course, they didn't tell us they had added a field, only that the program had stopped working), we discovered that CR-LFs were embedded in some fields. We then leaped to the (incorrect) conclusion that this was a recent occurence and the cause of our problems.
We later learned about the addition of the new field. However, it turned out that the new field also triggered a bug in our pre-processing code, but we didn't know this at the time.
In trying to fix the supposed problem with the embedded URLs, we had replaced the buggy pre-processing code. So, when we ran the program with the (correct) CR-LF replacement code, it worked. When we reverted to the (buggy) old pre-processing code, it stopped working. That led us to the incorrect conclusion that the CR-LF replacement code was needed.
So, in summary, DBD::CSV handles embedded CR-LFs fine. Mystery solved! (Of course, if there's a nifty setting to handle unescaped quotes, I'd be glad to learn of it!)
| [reply] |
That is odd. I stopped using this module for exactly this reason as well. I will try to get back to you about what your module was dying on. | [reply] |
| [reply] |