e2ka has asked for the wisdom of the Perl Monks concerning the following question:

O wise and merciful Monks, please consider the first question from a fool such as I,

I am writing a database which will contain time variable information which can be queried by its Interval of Validity, (IoV). This is merely a time duration given by its DATETIME since and DATETIME till. The rule is that you cannot insert an IoV that overlaps with IoV's already in the database. The other rule is that the insertions should be as fast as possible. To satisfy the first rule, a check must be made for overlaps before every insertion attempt.

I am finding that performing this check is decreasing my performance exponentially. I seem to have narrowed it down to a problem or misunderstanding of the perl DBI, as I wrote a test case in the vile tongue Java and there was no decrease in performance.

Here is the idea behind my test case:

connect to database and recreate tables insert 10000 rows into the parent table (channelView) prepare the statements for checking for overlaps and inserting to the data table (COND_dummy) for 10000 insertions { make a new iov check that the iov does not overlap insert the iov and some data if we have inserted 100 things, commit a transaction if we have inserted 1000 things, report performance }

Here are the results of the test case on my setup (Perl 5.8, MySQL 4.1, newest DBI and DBD::myqsl, fast machine...). On the right column you see the number of inserts per second.

Connect to database...Done. Reset database...Done. Define the database...Done. Fill channelView...Done. Begin insertions... [#insert] [date time] [ins/s] 1000 2005-03-08 12:38:25 808.696 2000 2005-03-08 12:38:27 482.931 3000 2005-03-08 12:38:30 329.941 4000 2005-03-08 12:38:34 255.688 5000 2005-03-08 12:38:39 209.914 6000 2005-03-08 12:38:45 169.025 7000 2005-03-08 12:38:51 153.192 8000 2005-03-08 12:38:59 133.599 9000 2005-03-08 12:39:07 120.618 10000 2005-03-08 12:39:16 109.312

What follows is my perl test case, which does exactly the same as a Java test case, and has (quickly) decreasing performance.

#!/usr/bin/perl + use warnings; use strict; $|++; + + use DBI; use POSIX; use Time::HiRes qw/gettimeofday/; use Data::Dumper; + my $db = "test_condDb_perform"; my $cond_name = "COND_dummy"; + print "Connect to database..."; my $dbh = DBI->connect("DBI:mysql:host=localhost;db=$db", "", "", {RaiseError=>1, AutoCommit=>1}); print "Done.\n"; + # reset the DB print "Reset database..."; $dbh->do( qq[ DROP DATABASE $db ] ); $dbh->do( qq[ CREATE DATABASE $db ]); $dbh->do( qq[ USE $db ]); print "Done.\n"; + + # define the database print "Define the database..."; $dbh->begin_work(); my $sql; $sql = qq[ CREATE TABLE channelView ( logic_id INT PRIMARY KEY ) TYPE=InnoDB ]; + $dbh->do($sql); + $sql = qq[ CREATE TABLE $cond_name ( logic_id INT, since DATETIME, till DATETIME, value FLOAT, PRIMARY KEY (logic_id, since, til +l), INDEX IoV (since, till), FOREIGN KEY (logic_id) REFERENCES channelView (logic_id) ) TYPE=InnoDB ]; $dbh->do($sql); $dbh->commit(); print "Done.\n"; + + # fill up the channelView print "Fill channelView..."; $dbh->begin_work(); $sql = qq[ INSERT INTO channelView SET logic_id = ? ]; my $fill_chview_h = $dbh->prepare_cached($sql); for (1..10000) { $fill_chview_h->execute($_); } $dbh->commit(); print "Done.\n"; # insert and write performance print "Begin insertions...\n\n"; my $iov_seed = time; my $logic_id = 9999; + my $count=0; my $MAXCOUNT = 1000; my $total=0; my $start = gettimeofday; my $commit_cnt = 0; my $commit_size = 100; + $sql = qq[ SELECT logic_id, since, till FROM $cond_name WHERE logic_id = ? AND ((since >= ? AND since < ?) OR (till > ? AND till < ?) OR (? >= since AND ? < till)) LIMIT 1 ]; my $overlap_check_h = $dbh->prepare_cached($sql); + $sql = qq[INSERT INTO $cond_name SET logic_id = ?, since = ?, till = ?, value = ?]; my $insert_h = $dbh->prepare_cached($sql); + $dbh->begin_work(); for (0..10000) { my ($t1, $t2) = make_iov(); $overlap_check_h->execute($logic_id, $t1, $t2, $t1, $t2, $t1, $t2); my @overlap = $overlap_check_h->fetchrow_array(); if (@overlap) { print "OVERLAP! That's impossible!\n"; } else { $insert_h->execute($logic_id, $t1, $t2, 1); $count++; $commit_cnt++; if ($commit_cnt == $commit_size) { $dbh->commit(); $dbh->begin_work(); $commit_cnt = 0; } if ($count == $MAXCOUNT) { report(); } } } $dbh->commit(); + sub report { printf "%9s %19s %9s\n", "[#insert]", "[date time]", "[ins/s]" if $total == 0; my $now = gettimeofday; $total += $count; printf "%9d %s %9.3f\n", $total, strftime("%Y-%m-%d %H:%M:%S", localtime), $count/(($now - $start)); $count = 0; $start = $now; } sub make_iov { return ( strftime("%Y-%m-%d %H:%M:%S", localtime($iov_seed) ), strftime("%Y-%m-%d %H:%M:%S", localtime(++$iov_seed) ) ); }

What is the cause of this horrible decrease in performance? Please, share your wisdom.

-e2ka

Edit by BazB - added readmore tags

Replies are listed 'Best First'.
Re: DBI performance problem with MySQL
by dragonchild (Archbishop) on Mar 08, 2005 at 20:03 UTC

    The problem is that the current DBD::mysql treats all bind parameters passed in through execute() as strings. That will make all your date arithmetic very very slow. You want to use bind_param() and the SQL types and that should help a bunch. There's a new beta-test version by gmax that will do server-side prepares and that will fix the problem.

    1. Turn AutoCommit off if you are committing in stages.
    2. Use MySQL's multi-value INSERT statement for speed. Even better is to insert your data outside of Perl.
    3. Turn off indices when inserting. An InnoDB FK has an implicit index, so you have the PK index and two other indices on that table.
    4. Don't use a FLOAT for value ... use an INT. That allows MySQL to optimize better.
    5. Have you considered using MySQL's datetime arithmetic and doing seconds comparisons vs. having MySQL attempt to do the string-wise comparisons you're forcing it to?

    Update: gmax has humbly informed me that he didn't write the new DBD::mysql that handles server-side prepares. Instead, he has pointed me to New twist for DBD::mysql which is his discussion of the new work.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      Thanks for your informative reply. I tried some of your suggestions, but am not yet able to solve the problem.

      First I should mention that this is an example I made by simplifying a much larger API I have been developing, so there are reasons for doing some things despite there being no reason in this example.

      Also, I realize there are things I can do to get faster insertion performance (such as using MySQL's LOAD mechanism), but I want the flexibility of making a perl module to interact with this database. I should change my original "as fast as possible" to "as fast as I can do it with perl DBI", which should be plenty fast if the speed doesn't decrease exponentially!

      (1.) Did that. I was fooling around with AutoCommit and forgot to turn it to 0 as I intend.

      (2.) I tried changing it to INSERT ... VALUES. Same behavior.

      (3.) You mean that I have 3 keys on COND_dummy, despite SHOW INDEXES saying I only have the two that I defined? Also, I am relying on the foreign key checks to ensure integrity of the relationship, so I think I should keep it. I tried to use SET FOREIGN_KEY_CHECKS=0 and the behavior was the same.

      (4.) The values I will really store will be FLOAT. (These are measurements)

      (5.) Not sure what you mean by this. Could you explain a bit more?

      The most interesting thing I got from your post is that A) DBD::mysql wasn't doing real binding and B) It will soon. I'm going to try the beta software and see if the server-prepared statements make the difference.

      The real question is whether that will make the difference. The java example I tried (made by a colleague helping me with this problem) used J/Connector 3.0.11... maybe it has "real" prepared statements...

      Here is a changed section of my code

      # insert and write performance print "Begin insertions...\n\n"; my $iov_seed = time; my $logic_id = 9999; + my $count=0; my $MAXCOUNT = 1000; my $total=0; my $start = gettimeofday; my $commit_cnt = 0; my $commit_size = 100; + $sql = qq[ SELECT logic_id, since, till FROM $cond_name WHERE logic_id = ? AND ((since >= ? AND since < ?) OR (till > ? AND till < ?) OR (? >= since AND ? < till)) LIMIT 1 ]; my $overlap_check_h = $dbh->prepare_cached($sql); + # $sql = qq[INSERT INTO $cond_name SET logic_id = ?, # since = ?, # till = ?, # value = ?]; $sql = qq[ INSERT INTO $cond_name VALUES (?, ?, ?, ?) ]; my $insert_h = $dbh->prepare_cached($sql); + $dbh->do( qq[ SET FOREIGN_KEY_CHECKS=0 ] ); for (0..10000) { my ($t1, $t2) = make_iov(); $overlap_check_h->bind_param(1, $logic_id, SQL_INTEGER); $overlap_check_h->bind_param(2, $t1, SQL_DATETIME); $overlap_check_h->bind_param(3, $t2, SQL_DATETIME); $overlap_check_h->bind_param(4, $t1, SQL_DATETIME); $overlap_check_h->bind_param(5, $t2, SQL_DATETIME); $overlap_check_h->bind_param(6, $t1, SQL_DATETIME); $overlap_check_h->bind_param(7, $t2, SQL_DATETIME); $overlap_check_h->execute(); my @overlap = $overlap_check_h->fetchrow_array(); if (@overlap) { print "OVERLAP! That's impossible!\n"; } else { $insert_h->bind_param(1, $logic_id, SQL_INTEGER); $insert_h->bind_param(2, $t1, SQL_DATETIME); $insert_h->bind_param(3, $t2, SQL_DATETIME); $insert_h->bind_param(4, 3.1415, SQL_FLOAT); $insert_h->execute(); $count++; $commit_cnt++; if ($commit_cnt == $commit_size) { $dbh->commit(); $commit_cnt = 0; } if ($count == $MAXCOUNT) { report(); } } } $dbh->do( qq[ SET FOREIGN_KEY_CHECKS=1 ] ); $dbh->commit(); $dbh->disconnect();

      and the results of these changes...

      [insert] [date time] [ins/s] 1000 2005-03-08 16:02:23 1053.265 2000 2005-03-08 16:02:25 555.494 3000 2005-03-08 16:02:27 374.252 4000 2005-03-08 16:02:31 280.796 5000 2005-03-08 16:02:35 221.845 6000 2005-03-08 16:02:41 189.839 7000 2005-03-08 16:02:47 162.636 8000 2005-03-08 16:02:54 142.633 9000 2005-03-08 16:03:02 126.241 10000 2005-03-08 16:03:10 114.398

      ... and here is the performance of the java test which accomplishes the same. This is my desired level of performance which I think should be attainable with DBI...

      0: 587.5440658049354 ins/s 1000: 1117.31843575419 ins/s 2000: 1658.374792703151 ins/s 3000: 1968.503937007874 ins/s 4000: 1968.503937007874 ins/s 5000: 2150.537634408602 ins/s 6000: 2212.3893805309735 ins/s 7000: 2262.443438914027 ins/s 8000: 2188.1838074398247 ins/s 9000: 2066.115702479339 ins/s 10000: 2314.814814814815 ins/s

      Thanks again for your help.

      -e2ka

      Update

      Just installed the DBD::mysql 2.9015_1 and re-ran the test. Same behavior

      [#insert] [date time] [ins/s] 1000 2005-03-08 16:32:06 1109.196 2000 2005-03-08 16:32:08 568.175 3000 2005-03-08 16:32:11 373.548 4000 2005-03-08 16:32:14 284.840 5000 2005-03-08 16:32:19 227.884 6000 2005-03-08 16:32:24 189.410 7000 2005-03-08 16:32:31 154.263 8000 2005-03-08 16:32:38 137.563 9000 2005-03-08 16:32:47 110.940 10000 2005-03-08 16:32:59 82.182

      Assuming all I need to do is install it and add mysql_server_prepare=1 to my connect string.

      Update 2

      I should also mention that I've tried setting the DBI_TRACE and it looks like everything is being executed correctly and in the right order.

        I should also mention that I've tried setting the DBI_TRACE and it looks like everything is being executed correctly and in the right order.

        It's not an ordering issue. If the column is of type DATETIME and the value it's being compared against is a VARCHAR, then no index will be used, meaning you're going to do a full table scan. Do this:

        • Turn DBI_TRACE on.
        • Cut'n'paste the EXACT query that DBI_TRACE tells you is being executed.
        • Do an EXPLAIN on that query.
        • Do the same thing with the Java version and look at the difference.

        The behavior you're describing is an increasing cost for a SELECT haing to do several full table scans. That's why I am 99% sure it's the placeholder thing. I've run into it before and actually tried patching DBD::mysql to do intelligent binding for numeric values, but it didn't go anywhere.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      DBD::mysql v. 2.9005_3 has been out for about five months. Has anyone used it to their satisfaction? If so, what is your confidence level?
Re: DBI performance problem with MySQL
by RazorbladeBidet (Friar) on Mar 08, 2005 at 19:45 UTC
    I see two things, neither of which are probably causing your performance problems.

    1. prepare_cached is unnecessary here, as it only gets called two times with two explicitly different statements

    2. Why set AutoCommit to 1 and use begin_work/commit everywhere? Why not turn it off and simply commit when you want?

    Again, given the time that I have been able to devote to looking it over, it looks good to me, and I cannot explain it away.
    --------------
    It's sad that a family can be torn apart by such a such a simple thing as a pack of wild dogs