batcater98 has asked for the wisdom of the Perl Monks concerning the following question:

I have a 39GB flat file that contains a recorder per row with 5 fields in each row. I am trying to read in this flat file and then parse each field and load into a DB. the fields on each row are seperated by a "," and there is a /n at the end of each row. I have tried just loading it into a n array - no luck, I tried using Tie:File, but either I am using it wrong or the comma's in each row are throwing it off and I can not even get a record count back. Suggestions? Thanks, Ad.

Replies are listed 'Best First'.
Re: Reading 39GB Flat File?
by almut (Canon) on Nov 12, 2009 at 16:40 UTC

    Why don't you read it line by line in a while loop (as the rows are delimited by \n). This way, the size of the file effectively wouldn't matter...  Or does the "parse each field and load into a DB" somehow require to combine info across multiple rows, so you need to have them all accessible at the same time?

Re: Reading 39GB Flat File?
by erix (Prior) on Nov 12, 2009 at 16:50 UTC

    Your database (which one you don't mention) probably has a way to import bulk data. For postgres, there is the COPY command which will read from STDIN (or directly from file):

    cat file.txt | psql -d dbname -c "copy mytable from stdin csv"

    'mytable' must have the appropriate columns, of course.

    ( A site-search on 'Bulk import' will probably yield bulk import methods for each RDBMS: bcp for sybase/mssql, imp for oracle, etc. )

    update:

    ok, ok... To appease the venerable and dodge the infamous award, this should be written as:

    < file.txt psql -d dbname -c "copy mytable from stdin csv"

    Although a process isn't as expensive as it used to be :)

      cat file.txt | ...
      Had this been a usenet posting, I would have given you a Useless use of cat award.

      -- Randal L. Schwartz, Perl hacker

      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.