beachbum has asked for the wisdom of the Perl Monks concerning the following question:

I wrote some scripts that use DBD::CSV 0.22 to read some fields from various flat files. These scripts have been running monthly without a problem until this month, when one of them failed with this error:

Error while reading file .\filename: Bad file descriptor at C:/Perl/si +te/lib/DBD/CSV.pm line 210, <GEN4> chunk 74885.

The script that is failing reads the largest of the files, and this month is the largest to date at (only) 54 MB. This is 2 MB larger than last month's file. I split the file in half, and ran the script on each of the two new files without a problem.

Am I missing a file size limitation, or maybe a number of records limitation in this module?

Replies are listed 'Best First'.
Re: DBD::CSV file size limitation?
by jZed (Prior) on Dec 07, 2005 at 18:41 UTC
    I (maintainer of DBD::CSV and its prereqs) am not aware of any arbitrary limits on filesize and there are certainly no records limitations. It can be slow and a memory hog for large files but I'm not sure how that would relate to your problem. One thing you might try is reading the file directly with Text::CSV_XS and feeding that line by line to DBD::CSV which would cut down on memory. I also wonder how your script and large file would fare on a different machine. Please let me know how this all turns out.
      Could it be a newline issue, that's silently solved when the OP cuts and "saves as" the file contents into halfs (e.g. using his favourite text editor)?


      holli, /regexed monk/
        Nice thought, I'll try it using a hex editor.... nm, I just checked the file, and it does end with 0D 0A.

      Thanks for the quick reply.
      I have tried running it on 3 different machines ranging from 1.3 Ghz / 1 GB to a new dual processor box with 4 GB. The RAM is not being consumed, however at least on first box, the processor has always maxed out while reading the files.

      This particular file does have 3 distinct record types in it, each with varying fields and lengths. This hasn't been an issue in the past as my query specifies WHERE RECORD_TYPE = n.

      I would like to try your suggestion on feeding the data into DBD::CSV line by line, but am a little confused as to how to do that. Am I able to issue a DBI->connect() to a string instead of a file?

        > This particular file does have 3 distinct record types
        > in it, each with varying fields and lengths

        That may be a problem. What record types are they? It's possible that the module assumes a given record type and then treats the other record-types as of the same kind, thereby getting confused about record boundaries and trying to create a single huge record.

Re: DBD::CSV file size limitation?
by beachbum (Beadle) on Dec 09, 2005 at 23:43 UTC

    Solution:
    It appears that having different record types (varying number and size of fields) in a single file was the issue. I generated an even larger file, 67 MB, all with the same record type which was read just fine.

    Thanks jZed!