in reply to Optimizing mysql and inserts

You ought to start with Devel::NYTProf to identify that it is actually your inserts that are chewing up so much time. You're probably correct, but it's good to know for sure before investing time in optimizing. However, there is one thing that stands out for me:

This snippet:

my $sql = sprintf('INSERT INTO entry VALUES(%s, %s, %s, %s, %s, %s, + %s, %s, %s)', $dbh->quote($host), $dbh->quote($filename), $dbh->quote($datestamp), $dbh->quote($type), $dbh->quote($size), $dbh->quote($from), $dbh->quote($to), $dbh->quote($subj), $dbh->quote($score) );

...could be more easily (and no less efficiently) written like this:

my $sth = $dbh->prepare( 'INSERT INTO entry VALUES(?,?,?,?,?,?,?,?,?) +' ); $sth->execute( $host, $filename, $datestamp, $type, $size, $from, $to, $sub, $score );

But now for an efficiency improvement: Since this method separates the preparation of the SQL from the binding of the values, you can move the prepare() statement outside of the loop. Do the prepare(...) before you start looping, and then inside the loop just keep calling $sth->execute() as many times as needed.

The reason this will be more efficient is because your SQL query won't have to be re-compiled by the server every time you do an insert; it will compile once and reuse over and over again with each set of bind-values.

There are other techniques too. For example, MySQL has a "LOAD DATA INFILE" statement that can pull your data quickly from a file. This should be significantly faster, with one big caveat: If your data isn't already in a file, it could be more expensive to pull all your data into a file first. In your case, that seems to be the case: You're not pulling from a file, but rather, from another database. It would probably not be worth spilling into an output file unless your source database also has a super-fast means of bulk dumping (something like "SELECT INTO OUTFILE"). Even then, you would have to benchmark to see if there is enough of a savings to be worthwhile.

Another technique is to tell MySQL to delay updating of its indices until after all of the inserts. But now we're getting into MySQL optimization, not Perl optimization. I recently read the O'Reilly book High Performance MySQL, and found it to be a nice resource. If you'll be doing more of this in the future, you might pick up a copy.


Dave