Here is my scenario, I have over 200K records which I read from a CSV file and need to pre-proccess them(adjust_format() subroutine in the below code) before importimg them into MySQL DB.

In the table mytbl, there is an unique-key constraint by combining three columns. and it's possible that some input records have the same unique keys(so they are duplicated entries and should be kept only one copy). What I did so far:

# where MyApp::DBIx is a sub-class of DBIx::Simple, my $csv = Text::CSV->new({ binary => 1 }); my $db = MyApp::DBIx->new(); my $sql = qq[ REPLACE INTO `mytbl` VALUES (??) ]; while (my $row = $csv->getline($csvfile)) { adjust_format($row); $db->query($sql, @{$row}); }

Note: If I use INSERT instead of REPLACE in the above SQL command, my script will be broken when any duplication found

So what's the best way to do bulk insertion with unique key constraints. The above code by insert(replace) one row each time is too slow. I know I can process each records in Perl and then print out tab delimited columnes into a file and then use a MySQL command like "LOAD DATA..."

LOAD DATA INFILE '/path/to/import.dat' INTO TABLE `mytbl`;

This sounds OK.. My question is: are there better Perl ways to handle bulk data insertion, which can take care all the following stuff

Many thanks

lihao


In reply to DBI Question: bulk insertion with Unique-key constraint by lihao

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.