in reply to DBI performance problem with MySQL

The problem is that the current DBD::mysql treats all bind parameters passed in through execute() as strings. That will make all your date arithmetic very very slow. You want to use bind_param() and the SQL types and that should help a bunch. There's a new beta-test version by gmax that will do server-side prepares and that will fix the problem.

  1. Turn AutoCommit off if you are committing in stages.
  2. Use MySQL's multi-value INSERT statement for speed. Even better is to insert your data outside of Perl.
  3. Turn off indices when inserting. An InnoDB FK has an implicit index, so you have the PK index and two other indices on that table.
  4. Don't use a FLOAT for value ... use an INT. That allows MySQL to optimize better.
  5. Have you considered using MySQL's datetime arithmetic and doing seconds comparisons vs. having MySQL attempt to do the string-wise comparisons you're forcing it to?

Update: gmax has humbly informed me that he didn't write the new DBD::mysql that handles server-side prepares. Instead, he has pointed me to New twist for DBD::mysql which is his discussion of the new work.

Being right, does not endow the right to be rude; politeness costs nothing.
Being unknowing, is not the same as being stupid.
Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Replies are listed 'Best First'.
Re^2: DBI performance problem with MySQL
by e2ka (Acolyte) on Mar 08, 2005 at 22:12 UTC

    Thanks for your informative reply. I tried some of your suggestions, but am not yet able to solve the problem.

    First I should mention that this is an example I made by simplifying a much larger API I have been developing, so there are reasons for doing some things despite there being no reason in this example.

    Also, I realize there are things I can do to get faster insertion performance (such as using MySQL's LOAD mechanism), but I want the flexibility of making a perl module to interact with this database. I should change my original "as fast as possible" to "as fast as I can do it with perl DBI", which should be plenty fast if the speed doesn't decrease exponentially!

    (1.) Did that. I was fooling around with AutoCommit and forgot to turn it to 0 as I intend.

    (2.) I tried changing it to INSERT ... VALUES. Same behavior.

    (3.) You mean that I have 3 keys on COND_dummy, despite SHOW INDEXES saying I only have the two that I defined? Also, I am relying on the foreign key checks to ensure integrity of the relationship, so I think I should keep it. I tried to use SET FOREIGN_KEY_CHECKS=0 and the behavior was the same.

    (4.) The values I will really store will be FLOAT. (These are measurements)

    (5.) Not sure what you mean by this. Could you explain a bit more?

    The most interesting thing I got from your post is that A) DBD::mysql wasn't doing real binding and B) It will soon. I'm going to try the beta software and see if the server-prepared statements make the difference.

    The real question is whether that will make the difference. The java example I tried (made by a colleague helping me with this problem) used J/Connector 3.0.11... maybe it has "real" prepared statements...

    Here is a changed section of my code

    # insert and write performance print "Begin insertions...\n\n"; my $iov_seed = time; my $logic_id = 9999; + my $count=0; my $MAXCOUNT = 1000; my $total=0; my $start = gettimeofday; my $commit_cnt = 0; my $commit_size = 100; + $sql = qq[ SELECT logic_id, since, till FROM $cond_name WHERE logic_id = ? AND ((since >= ? AND since < ?) OR (till > ? AND till < ?) OR (? >= since AND ? < till)) LIMIT 1 ]; my $overlap_check_h = $dbh->prepare_cached($sql); + # $sql = qq[INSERT INTO $cond_name SET logic_id = ?, # since = ?, # till = ?, # value = ?]; $sql = qq[ INSERT INTO $cond_name VALUES (?, ?, ?, ?) ]; my $insert_h = $dbh->prepare_cached($sql); + $dbh->do( qq[ SET FOREIGN_KEY_CHECKS=0 ] ); for (0..10000) { my ($t1, $t2) = make_iov(); $overlap_check_h->bind_param(1, $logic_id, SQL_INTEGER); $overlap_check_h->bind_param(2, $t1, SQL_DATETIME); $overlap_check_h->bind_param(3, $t2, SQL_DATETIME); $overlap_check_h->bind_param(4, $t1, SQL_DATETIME); $overlap_check_h->bind_param(5, $t2, SQL_DATETIME); $overlap_check_h->bind_param(6, $t1, SQL_DATETIME); $overlap_check_h->bind_param(7, $t2, SQL_DATETIME); $overlap_check_h->execute(); my @overlap = $overlap_check_h->fetchrow_array(); if (@overlap) { print "OVERLAP! That's impossible!\n"; } else { $insert_h->bind_param(1, $logic_id, SQL_INTEGER); $insert_h->bind_param(2, $t1, SQL_DATETIME); $insert_h->bind_param(3, $t2, SQL_DATETIME); $insert_h->bind_param(4, 3.1415, SQL_FLOAT); $insert_h->execute(); $count++; $commit_cnt++; if ($commit_cnt == $commit_size) { $dbh->commit(); $commit_cnt = 0; } if ($count == $MAXCOUNT) { report(); } } } $dbh->do( qq[ SET FOREIGN_KEY_CHECKS=1 ] ); $dbh->commit(); $dbh->disconnect();

    and the results of these changes...

    [insert] [date time] [ins/s] 1000 2005-03-08 16:02:23 1053.265 2000 2005-03-08 16:02:25 555.494 3000 2005-03-08 16:02:27 374.252 4000 2005-03-08 16:02:31 280.796 5000 2005-03-08 16:02:35 221.845 6000 2005-03-08 16:02:41 189.839 7000 2005-03-08 16:02:47 162.636 8000 2005-03-08 16:02:54 142.633 9000 2005-03-08 16:03:02 126.241 10000 2005-03-08 16:03:10 114.398

    ... and here is the performance of the java test which accomplishes the same. This is my desired level of performance which I think should be attainable with DBI...

    0: 587.5440658049354 ins/s 1000: 1117.31843575419 ins/s 2000: 1658.374792703151 ins/s 3000: 1968.503937007874 ins/s 4000: 1968.503937007874 ins/s 5000: 2150.537634408602 ins/s 6000: 2212.3893805309735 ins/s 7000: 2262.443438914027 ins/s 8000: 2188.1838074398247 ins/s 9000: 2066.115702479339 ins/s 10000: 2314.814814814815 ins/s

    Thanks again for your help.

    -e2ka

    Update

    Just installed the DBD::mysql 2.9015_1 and re-ran the test. Same behavior

    [#insert] [date time] [ins/s] 1000 2005-03-08 16:32:06 1109.196 2000 2005-03-08 16:32:08 568.175 3000 2005-03-08 16:32:11 373.548 4000 2005-03-08 16:32:14 284.840 5000 2005-03-08 16:32:19 227.884 6000 2005-03-08 16:32:24 189.410 7000 2005-03-08 16:32:31 154.263 8000 2005-03-08 16:32:38 137.563 9000 2005-03-08 16:32:47 110.940 10000 2005-03-08 16:32:59 82.182

    Assuming all I need to do is install it and add mysql_server_prepare=1 to my connect string.

    Update 2

    I should also mention that I've tried setting the DBI_TRACE and it looks like everything is being executed correctly and in the right order.

      I should also mention that I've tried setting the DBI_TRACE and it looks like everything is being executed correctly and in the right order.

      It's not an ordering issue. If the column is of type DATETIME and the value it's being compared against is a VARCHAR, then no index will be used, meaning you're going to do a full table scan. Do this:

      • Turn DBI_TRACE on.
      • Cut'n'paste the EXACT query that DBI_TRACE tells you is being executed.
      • Do an EXPLAIN on that query.
      • Do the same thing with the Java version and look at the difference.

      The behavior you're describing is an increasing cost for a SELECT haing to do several full table scans. That's why I am 99% sure it's the placeholder thing. I've run into it before and actually tried patching DBD::mysql to do intelligent binding for numeric values, but it didn't go anywhere.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

        O esteemed saint, when I turn on DBI trace (even at maximum level), the only query I see is the one sent to prepare(), which has placeholders. (Can't execute EXPLAIN on it). But say I fill in those placeholders with string datetimes:

        EXPLAIN SELECT logic_id, since, till FROM COND_dummy WHERE logic_id = 9999 AND ((since >= '2005-01-01 00:00:00' AND since < '2005-01-01 00:00:01') OR (till > '2005-01-01 00:00:00' AND till < '2005-01-01 00:00:01') OR ('2005-01-01 00:00:00' >= since AND '2005-01-01 00:00:01' < till)) + LIMIT 1;

        It says that it is using my indexes.

        +----+-------------+------------+------+---------------+---------+---------+-------+------+--------------------------+
        | id | select_type | table      | type | possible_keys | key     | key_len | ref   | rows | Extra                    |
        +----+-------------+------------+------+---------------+---------+---------+-------+------+--------------------------+
        |  1 | SIMPLE      | COND_dummy | ref  | PRIMARY,IoV   | PRIMARY |       4 | const |    1 | Using where; Using index |
        +----+-------------+------------+------+---------------+---------+---------+-------+------+--------------------------+
        

        And just to test the theory that it has something to do with DATETIME and VARCHARS and conversions at all, I changed the DATETIME columns to INT and ran the script again. Exact same behavior. In fact, somehow, using INTs for the since, till columns made everything slower!

        [#insert] [date time] [ins/s] 1000 2005-03-09 10:58:10 649.516 2000 2005-03-09 10:58:14 287.810 3000 2005-03-09 10:58:19 212.805 4000 2005-03-09 10:58:26 136.577 5000 2005-03-09 10:58:34 118.965 6000 2005-03-09 10:58:45 94.537 7000 2005-03-09 10:58:57 84.296 8000 2005-03-09 10:59:10 74.197 9000 2005-03-09 10:59:26 64.026 10000 2005-03-09 10:59:44 54.799

        It is a mystery.

Re^2: DBI performance problem with MySQL
by mshiltonj (Sexton) on Mar 10, 2005 at 14:33 UTC
    DBD::mysql v. 2.9005_3 has been out for about five months. Has anyone used it to their satisfaction? If so, what is your confidence level?