in reply to Re: DBI performance problem with MySQL
in thread DBI performance problem with MySQL

Thanks for your informative reply. I tried some of your suggestions, but am not yet able to solve the problem.

First I should mention that this is an example I made by simplifying a much larger API I have been developing, so there are reasons for doing some things despite there being no reason in this example.

Also, I realize there are things I can do to get faster insertion performance (such as using MySQL's LOAD mechanism), but I want the flexibility of making a perl module to interact with this database. I should change my original "as fast as possible" to "as fast as I can do it with perl DBI", which should be plenty fast if the speed doesn't decrease exponentially!

(1.) Did that. I was fooling around with AutoCommit and forgot to turn it to 0 as I intend.

(2.) I tried changing it to INSERT ... VALUES. Same behavior.

(3.) You mean that I have 3 keys on COND_dummy, despite SHOW INDEXES saying I only have the two that I defined? Also, I am relying on the foreign key checks to ensure integrity of the relationship, so I think I should keep it. I tried to use SET FOREIGN_KEY_CHECKS=0 and the behavior was the same.

(4.) The values I will really store will be FLOAT. (These are measurements)

(5.) Not sure what you mean by this. Could you explain a bit more?

The most interesting thing I got from your post is that A) DBD::mysql wasn't doing real binding and B) It will soon. I'm going to try the beta software and see if the server-prepared statements make the difference.

The real question is whether that will make the difference. The java example I tried (made by a colleague helping me with this problem) used J/Connector 3.0.11... maybe it has "real" prepared statements...

Here is a changed section of my code

# insert and write performance print "Begin insertions...\n\n"; my $iov_seed = time; my $logic_id = 9999; + my $count=0; my $MAXCOUNT = 1000; my $total=0; my $start = gettimeofday; my $commit_cnt = 0; my $commit_size = 100; + $sql = qq[ SELECT logic_id, since, till FROM $cond_name WHERE logic_id = ? AND ((since >= ? AND since < ?) OR (till > ? AND till < ?) OR (? >= since AND ? < till)) LIMIT 1 ]; my $overlap_check_h = $dbh->prepare_cached($sql); + # $sql = qq[INSERT INTO $cond_name SET logic_id = ?, # since = ?, # till = ?, # value = ?]; $sql = qq[ INSERT INTO $cond_name VALUES (?, ?, ?, ?) ]; my $insert_h = $dbh->prepare_cached($sql); + $dbh->do( qq[ SET FOREIGN_KEY_CHECKS=0 ] ); for (0..10000) { my ($t1, $t2) = make_iov(); $overlap_check_h->bind_param(1, $logic_id, SQL_INTEGER); $overlap_check_h->bind_param(2, $t1, SQL_DATETIME); $overlap_check_h->bind_param(3, $t2, SQL_DATETIME); $overlap_check_h->bind_param(4, $t1, SQL_DATETIME); $overlap_check_h->bind_param(5, $t2, SQL_DATETIME); $overlap_check_h->bind_param(6, $t1, SQL_DATETIME); $overlap_check_h->bind_param(7, $t2, SQL_DATETIME); $overlap_check_h->execute(); my @overlap = $overlap_check_h->fetchrow_array(); if (@overlap) { print "OVERLAP! That's impossible!\n"; } else { $insert_h->bind_param(1, $logic_id, SQL_INTEGER); $insert_h->bind_param(2, $t1, SQL_DATETIME); $insert_h->bind_param(3, $t2, SQL_DATETIME); $insert_h->bind_param(4, 3.1415, SQL_FLOAT); $insert_h->execute(); $count++; $commit_cnt++; if ($commit_cnt == $commit_size) { $dbh->commit(); $commit_cnt = 0; } if ($count == $MAXCOUNT) { report(); } } } $dbh->do( qq[ SET FOREIGN_KEY_CHECKS=1 ] ); $dbh->commit(); $dbh->disconnect();

and the results of these changes...

[insert] [date time] [ins/s] 1000 2005-03-08 16:02:23 1053.265 2000 2005-03-08 16:02:25 555.494 3000 2005-03-08 16:02:27 374.252 4000 2005-03-08 16:02:31 280.796 5000 2005-03-08 16:02:35 221.845 6000 2005-03-08 16:02:41 189.839 7000 2005-03-08 16:02:47 162.636 8000 2005-03-08 16:02:54 142.633 9000 2005-03-08 16:03:02 126.241 10000 2005-03-08 16:03:10 114.398

... and here is the performance of the java test which accomplishes the same. This is my desired level of performance which I think should be attainable with DBI...

0: 587.5440658049354 ins/s 1000: 1117.31843575419 ins/s 2000: 1658.374792703151 ins/s 3000: 1968.503937007874 ins/s 4000: 1968.503937007874 ins/s 5000: 2150.537634408602 ins/s 6000: 2212.3893805309735 ins/s 7000: 2262.443438914027 ins/s 8000: 2188.1838074398247 ins/s 9000: 2066.115702479339 ins/s 10000: 2314.814814814815 ins/s

Thanks again for your help.

-e2ka

Update

Just installed the DBD::mysql 2.9015_1 and re-ran the test. Same behavior

[#insert] [date time] [ins/s] 1000 2005-03-08 16:32:06 1109.196 2000 2005-03-08 16:32:08 568.175 3000 2005-03-08 16:32:11 373.548 4000 2005-03-08 16:32:14 284.840 5000 2005-03-08 16:32:19 227.884 6000 2005-03-08 16:32:24 189.410 7000 2005-03-08 16:32:31 154.263 8000 2005-03-08 16:32:38 137.563 9000 2005-03-08 16:32:47 110.940 10000 2005-03-08 16:32:59 82.182

Assuming all I need to do is install it and add mysql_server_prepare=1 to my connect string.

Update 2

I should also mention that I've tried setting the DBI_TRACE and it looks like everything is being executed correctly and in the right order.

Replies are listed 'Best First'.
Re^3: DBI performance problem with MySQL
by dragonchild (Archbishop) on Mar 09, 2005 at 03:08 UTC
    I should also mention that I've tried setting the DBI_TRACE and it looks like everything is being executed correctly and in the right order.

    It's not an ordering issue. If the column is of type DATETIME and the value it's being compared against is a VARCHAR, then no index will be used, meaning you're going to do a full table scan. Do this:

    • Turn DBI_TRACE on.
    • Cut'n'paste the EXACT query that DBI_TRACE tells you is being executed.
    • Do an EXPLAIN on that query.
    • Do the same thing with the Java version and look at the difference.

    The behavior you're describing is an increasing cost for a SELECT haing to do several full table scans. That's why I am 99% sure it's the placeholder thing. I've run into it before and actually tried patching DBD::mysql to do intelligent binding for numeric values, but it didn't go anywhere.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      O esteemed saint, when I turn on DBI trace (even at maximum level), the only query I see is the one sent to prepare(), which has placeholders. (Can't execute EXPLAIN on it). But say I fill in those placeholders with string datetimes:

      EXPLAIN SELECT logic_id, since, till FROM COND_dummy WHERE logic_id = 9999 AND ((since >= '2005-01-01 00:00:00' AND since < '2005-01-01 00:00:01') OR (till > '2005-01-01 00:00:00' AND till < '2005-01-01 00:00:01') OR ('2005-01-01 00:00:00' >= since AND '2005-01-01 00:00:01' < till)) + LIMIT 1;

      It says that it is using my indexes.

      +----+-------------+------------+------+---------------+---------+---------+-------+------+--------------------------+
      | id | select_type | table      | type | possible_keys | key     | key_len | ref   | rows | Extra                    |
      +----+-------------+------------+------+---------------+---------+---------+-------+------+--------------------------+
      |  1 | SIMPLE      | COND_dummy | ref  | PRIMARY,IoV   | PRIMARY |       4 | const |    1 | Using where; Using index |
      +----+-------------+------------+------+---------------+---------+---------+-------+------+--------------------------+
      

      And just to test the theory that it has something to do with DATETIME and VARCHARS and conversions at all, I changed the DATETIME columns to INT and ran the script again. Exact same behavior. In fact, somehow, using INTs for the since, till columns made everything slower!

      [#insert] [date time] [ins/s] 1000 2005-03-09 10:58:10 649.516 2000 2005-03-09 10:58:14 287.810 3000 2005-03-09 10:58:19 212.805 4000 2005-03-09 10:58:26 136.577 5000 2005-03-09 10:58:34 118.965 6000 2005-03-09 10:58:45 94.537 7000 2005-03-09 10:58:57 84.296 8000 2005-03-09 10:59:10 74.197 9000 2005-03-09 10:59:26 64.026 10000 2005-03-09 10:59:44 54.799

      It is a mystery.

        There's using an index and then there's utilizing an index. Your query is going to use the composite PRIMARY key because the first column in that key is "logic_id". However, if you comment out the PK and leave only the IoV index, I think you'll find that it's not using the index.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.