Re^2: Best method to load a csv file.

The problem with that is I have to manipulate the data before I can load it. I could manipulate the date and write it to a new file and then load it using SQL*Loader. I would rather just have one script to load the data. The amount of data that I am loading on a regular basis is anywhere from 300,000 to 400,000 records. That is spread out among 5 or 6 different files. Each file has a different format. The largest file, about 200k records, only takes a few hours to run. I am not too concerned about how long it takes. I just want to know if there is a better way to load the data using Perl/DBI.

Thanks,
Sam

Comment on Re^2: Best method to load a csv file.

Replies are listed 'Best First'.
Re^3: Best method to load a csv file. by diotalevi (Canon) on Dec 17, 2004 at 15:32 UTC
The database loading portion of the task is often resource intensive enough that you will want to use your database's import facility. Just process your data in your script prior to giving it to the database. That's the way most of us do that anyway. You should also be able to reduce your runtime from a few hours per file to a few minutes for everything if you do this.	[reply]
Re^4: Best method to load a csv file. by BravoTwoZero (Scribe) on Dec 17, 2004 at 16:02 UTC
Plus, you can always call your database import utility from the same perl program that modifies your data when it's complete. With that number of records, I'd still write it out to a text file, though, before calling the external db load utility. Amatuers discuss tactics. Professionals discuss logistics. And... my cat's breath smells like cat food.	[reply]
Re^5: Best method to load a csv file. by samgold (Scribe) on Dec 17, 2004 at 16:08 UTC
That's true. So back to the question of parsing the csv file, which do you think is better or faster? The way I am doing it now with Text::ParseWords or DBI::CSV or Text::csv? Thanks, Sam	[reply]
Re^6: Best method to load a csv file. by dragonchild (Archbishop) on Dec 17, 2004 at 17:23 UTC
Re^7: Best method to load a csv file. by samgold (Scribe) on Dec 17, 2004 at 19:31 UTC
Re^3: Best method to load a csv file. by RiotTown (Scribe) on Dec 17, 2004 at 17:19 UTC
I'd suggest a shell script that would fire off a SQLLoader process for each of the different file layouts. From documentation on SQLLoader: To load data from multiple datafiles in one SQL*Loader run, use an INFILE clause for each datafile. Datafiles need not have the same file processing options, although the layout of the records must be identical. Write a control file for each of the 5 or 6 different data file layouts then make sure to reference that in your calling statement.	[reply]
Re^3: Best method to load a csv file. by bart (Canon) on Dec 18, 2004 at 12:01 UTC
My experience with MS-Access leads me to believe that generating a new CSV file, in the format you want, is up to several orders of magnitude faster than inserting using DBI directly to insert every item into the database, one by one: a few seconds for generating the text files, as opposed to many minutes using DBI, for tables of a few ten thousands of records. So, As a generic approach, I'd advice you to generate the flatfiles with perl, and use the database's loader/import mechanism to actually read the data into the database. Note that reading all the data out of the database using Perl+DBI isn't nearly as slow as inserting/updating the records. update I notice that jdtoronto, based on his own, independent experiences, came to the same conclusion (point 3).	[reply]


Just another Perl shrine
	PerlMonks