in reply to DBD::CSV file size limitation?

I (maintainer of DBD::CSV and its prereqs) am not aware of any arbitrary limits on filesize and there are certainly no records limitations. It can be slow and a memory hog for large files but I'm not sure how that would relate to your problem. One thing you might try is reading the file directly with Text::CSV_XS and feeding that line by line to DBD::CSV which would cut down on memory. I also wonder how your script and large file would fare on a different machine. Please let me know how this all turns out.

Replies are listed 'Best First'.
Re^2: DBD::CSV file size limitation?
by holli (Abbot) on Dec 07, 2005 at 19:01 UTC
    Could it be a newline issue, that's silently solved when the OP cuts and "saves as" the file contents into halfs (e.g. using his favourite text editor)?


    holli, /regexed monk/
      Nice thought, I'll try it using a hex editor.... nm, I just checked the file, and it does end with 0D 0A.
Re^2: DBD::CSV file size limitation?
by beachbum (Beadle) on Dec 07, 2005 at 19:01 UTC

    Thanks for the quick reply.
    I have tried running it on 3 different machines ranging from 1.3 Ghz / 1 GB to a new dual processor box with 4 GB. The RAM is not being consumed, however at least on first box, the processor has always maxed out while reading the files.

    This particular file does have 3 distinct record types in it, each with varying fields and lengths. This hasn't been an issue in the past as my query specifies WHERE RECORD_TYPE = n.

    I would like to try your suggestion on feeding the data into DBD::CSV line by line, but am a little confused as to how to do that. Am I able to issue a DBI->connect() to a string instead of a file?

      > This particular file does have 3 distinct record types
      > in it, each with varying fields and lengths

      That may be a problem. What record types are they? It's possible that the module assumes a given record type and then treats the other record-types as of the same kind, thereby getting confused about record boundaries and trying to create a single huge record.

        The record types are different in that type '0' has 50 fields, type '1' has 22 fields, type '2' has 11 fields. The first 5 fields of each record are common, with field 5 being the record type (0|1|2). There is no binary data in the records.

        I am only interested in one type at a time, and have been sucessfully getting that until now with WHERE type = 1.