lihao has asked for the wisdom of the Perl Monks concerning the following question:
Here is my scenario, I have over 200K records which I read from a CSV file and need to pre-proccess them(adjust_format() subroutine in the below code) before importimg them into MySQL DB.
In the table mytbl, there is an unique-key constraint by combining three columns. and it's possible that some input records have the same unique keys(so they are duplicated entries and should be kept only one copy). What I did so far:
# where MyApp::DBIx is a sub-class of DBIx::Simple, my $csv = Text::CSV->new({ binary => 1 }); my $db = MyApp::DBIx->new(); my $sql = qq[ REPLACE INTO `mytbl` VALUES (??) ]; while (my $row = $csv->getline($csvfile)) { adjust_format($row); $db->query($sql, @{$row}); }
Note: If I use INSERT instead of REPLACE in the above SQL command, my script will be broken when any duplication found
So what's the best way to do bulk insertion with unique key constraints. The above code by insert(replace) one row each time is too slow. I know I can process each records in Perl and then print out tab delimited columnes into a file and then use a MySQL command like "LOAD DATA..."
LOAD DATA INFILE '/path/to/import.dat' INTO TABLE `mytbl`;
This sounds OK.. My question is: are there better Perl ways to handle bulk data insertion, which can take care all the following stuff
Many thanks
lihao
|
|---|