Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Best method to load a csv file.

by jdtoronto (Prior)
on Dec 17, 2004 at 18:29 UTC ( [id://415733]=note: print w/replies, xml ) Need Help??


in reply to Best method to load a csv file.

Much of my time is spent doing just this. Several comments:
  • Text::CSV_XS has been the most reliable I have found. It is flexible and so far I haven't found a 'CSV' file I cannot handle. Text::CSV::Simple is too simple and won't write CSV format.
  • When it comes to processing the data. In some cases I import the CSV file into a temp table, using the LOAD functions of MySQL. Then I process the data in the table. This is useful in Tk based apps where I can use Tk::DBI::Table to preview the data for users, especially where they are doing something like mapping input data fields to our database structure.
  • Where I have bulk data or something is done routinely in a known structure I do as has been suggested earlier. I read the CSV file, process it, spit it out as a file again then use the LOAD from within the Perl script to have MySQL import it. The speed advantage is amazing! I have one job which imports around 100,000 records per day, using DBI to insert them after reading the file 7-8 minutes. Pre-processing takes about 35 seconds and the MySQL load averages 170ms!
Good luck! I hope this helps.

jdtoronto

Replies are listed 'Best First'.
Re^2: Best method to load a csv file.
by doom (Deacon) on Dec 18, 2004 at 08:50 UTC
    jdtoronto wrote:
    Text::CSV_XS has been the most reliable I have found. It is flexible and so far I haven't found a 'CSV' file I cannot handle.
    Well I certainly have, though I don't remember what the difficulties were in detail now (I think it was something like the "csv" file had spaces after the commas -- the trouble with csv is that there is no standard for it but defacto standards).

    I've heard that DBD::AnyData with trim=>1 can deal with spaces after commas, but I haven't tried it myself.

    With Text::CSV_XS, you almost certainly want to use the "binary" option. Otherwise you'll have problems with values that have extended characters or embedded newlines.

    DBD::CSV uses Text::CSV_XS internally, so if Text::CSV_XS (with binary on) is no good, don't expect DBI::CSV to do any better. If I remember right, there's something a little screwy with the way DBD::CSV converts the header row into database column names (e.g. you may have trouble if there are spaces in your column descriptions). Either fix-up the first row of your csv file manually, or look for a way to tell it what names of the columns will be over-riding the header row (as I remember it, there *is* a way, though I don't see it in the man page at the moment).

    You should take a look at this: dbi_dealing_with_csv

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://415733]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-19 00:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found