Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re^2: Best method to load a csv file.

by samgold (Scribe)
on Dec 17, 2004 at 15:12 UTC ( #415655=note: print w/replies, xml ) Need Help??

in reply to Re: Best method to load a csv file.
in thread Best method to load a csv file.

The problem with that is I have to manipulate the data before I can load it. I could manipulate the date and write it to a new file and then load it using SQL*Loader. I would rather just have one script to load the data. The amount of data that I am loading on a regular basis is anywhere from 300,000 to 400,000 records. That is spread out among 5 or 6 different files. Each file has a different format. The largest file, about 200k records, only takes a few hours to run. I am not too concerned about how long it takes. I just want to know if there is a better way to load the data using Perl/DBI.


Replies are listed 'Best First'.
Re^3: Best method to load a csv file.
by diotalevi (Canon) on Dec 17, 2004 at 15:32 UTC
    The database loading portion of the task is often resource intensive enough that you will want to use your database's import facility. Just process your data in your script prior to giving it to the database. That's the way most of us do that anyway. You should also be able to reduce your runtime from a few hours per file to a few minutes for everything if you do this.

      Plus, you can always call your database import utility from the same perl program that modifies your data when it's complete. With that number of records, I'd still write it out to a text file, though, before calling the external db load utility.

      Amatuers discuss tactics. Professionals discuss logistics. And... my cat's breath smells like cat food.
        That's true. So back to the question of parsing the csv file, which do you think is better or faster? The way I am doing it now with Text::ParseWords or DBI::CSV or Text::csv?

Re^3: Best method to load a csv file.
by RiotTown (Scribe) on Dec 17, 2004 at 17:19 UTC
    I'd suggest a shell script that would fire off a SQL*Loader process for each of the different file layouts.

    From documentation on SQL*Loader:

    To load data from multiple datafiles in one SQL*Loader run, use an INFILE clause for each datafile. Datafiles need not have the same file processing options, although the layout of the records must be identical.

    Write a control file for each of the 5 or 6 different data file layouts then make sure to reference that in your calling statement.
Re^3: Best method to load a csv file.
by bart (Canon) on Dec 18, 2004 at 12:01 UTC
    My experience with MS-Access leads me to believe that generating a new CSV file, in the format you want, is up to several orders of magnitude faster than inserting using DBI directly to insert every item into the database, one by one: a few seconds for generating the text files, as opposed to many minutes using DBI, for tables of a few ten thousands of records.

    So, As a generic approach, I'd advice you to generate the flatfiles with perl, and use the database's loader/import mechanism to actually read the data into the database.

    Note that reading all the data out of the database using Perl+DBI isn't nearly as slow as inserting/updating the records.

    update I notice that jdtoronto, based on his own, independent experiences, came to the same conclusion (point 3).

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://415655]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2022-08-08 02:02 GMT
Find Nodes?
    Voting Booth?

    No recent polls found