Re: Best method to load a csv file.

Much of my time is spent doing just this. Several comments:

Text::CSV_XS has been the most reliable I have found. It is flexible and so far I haven't found a 'CSV' file I cannot handle. Text::CSV::Simple is too simple and won't write CSV format.
When it comes to processing the data. In some cases I import the CSV file into a temp table, using the LOAD functions of MySQL. Then I process the data in the table. This is useful in Tk based apps where I can use Tk::DBI::Table to preview the data for users, especially where they are doing something like mapping input data fields to our database structure.
Where I have bulk data or something is done routinely in a known structure I do as has been suggested earlier. I read the CSV file, process it, spit it out as a file again then use the LOAD from within the Perl script to have MySQL import it. The speed advantage is amazing! I have one job which imports around 100,000 records per day, using DBI to insert them after reading the file 7-8 minutes. Pre-processing takes about 35 seconds and the MySQL load averages 170ms!

Good luck! I hope this helps.

Comment on Re: Best method to load a csv file.

Replies are listed 'Best First'.

Re^2: Best method to load a csv file.
by doom (Deacon) on Dec 18, 2004 at 08:50 UTC

Text::CSV_XS has been the most reliable I have found. It is flexible and so far I haven't found a 'CSV' file I cannot handle.

I've heard that DBD::AnyData with trim=>1 can deal with spaces after commas, but I haven't tried it myself.

With Text::CSV_XS, you almost certainly want to use the "binary" option. Otherwise you'll have problems with values that have extended characters or embedded newlines.

DBD::CSV uses Text::CSV_XS internally, so if Text::CSV_XS (with binary on) is no good, don't expect DBI::CSV to do any better. If I remember right, there's something a little screwy with the way DBD::CSV converts the header row into database column names (e.g. you may have trouble if there are spaces in your column descriptions). Either fix-up the first row of your csv file manually, or look for a way to tell it what names of the columns will be over-riding the header row (as I remember it, there *is* a way, though I don't see it in the man page at the moment).

You should take a look at this: dbi_dealing_with_csv

[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks