cosmicperl has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
  I updated a large mysql database so that the tables took up less space, I set the right field types (they were all varchar rather than a mixture of int's, reals, etc) I also fixed all the dates which were in the format MM:DD:YYYY to 'YYYYMMDD' so that numerical comparisons could be used. I flushed and optimized the tables. The old database calls the script produced looked like:-
SELECT sum(numlog) FROM table1 WHERE date IN ( '12:04:2007', '12:05:20 +07', ... up to 500 odd dates)
They now look like:-
SELECT sum(numlog) FROM table1 WHERE date >= 20071204 AND date < 20080 +112
I fired it up, looking forward to what I expected to be a huge speed increase (the table I'm testing has over half a million records). Right now I want to cry as the new code it actually running slower. So I tested just the MySQL and it gave me the same results, trying out the new sql and a new take on the old:-
SELECT sum(numlog) FROM table1 WHERE date IN ( '20071204', '20071205', + ... 30 days)
The new one take about 4.2 seconds on average, the old one takes about 3.9 on average.
What gives???? How can all those OR's possibly be faster than a > and a <?


Lyle

Replies are listed 'Best First'.
Re: Perl and MySQL woes
by kyle (Abbot) on Jan 13, 2008 at 03:14 UTC

    This is really a MySQL question.

    I suggest you ask MySQL to explain both queries and see how they differ. My guess is that the one using IN is using an index for each of the dates you give it, and the other one is doing a scan of some kind.

      I only tend to code Perl so I guess I'm too used to coming here with my questions. My bad
Re: Perl and MySQL woes
by McDarren (Abbot) on Jan 13, 2008 at 03:38 UTC
    Side comment:
    SELECT sum(numlog) FROM table1 WHERE date >= 20071204 AND date < 20080 +112
    Can also be written as:
    SELECT sum(numlog) FROM table1 WHERE date BETWEEN 20071204 AND 2008011 +2
    --Darren

      Further Side Comment:

      You can use BETWEEN on date field types, not just INT. So if this entire thing was done so that he could use <= and => he could have done that with date field types directly.

      --
      I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.
      Thanks Darren, I'll see if that speeds things up
Re: Perl and MySQL woes
by graff (Chancellor) on Jan 13, 2008 at 07:33 UTC
    Let's suppose the database table is set up in such a way that this "date" field is being indexed for rapid retrieval of specific values, but this index doesn't help for queries that end up having to do some test on every value (e.g. "greater than this and less than that") -- maybe such a query ends up just doing a full-table scan because the index can't be used that way. (Maybe the "explain" facility will make that clear.)

    If that's the case, it's possible that the rephrasing suggested by McDarren might save the day, if it allows the query conditions to be tested against the values in the index, rather than the values in the table itself.

    But in case that doesn't pan out, it becomes a question of how to specify the query for those 100's of specific values in a more sensible way (i.e. not just making a list of 100's of values conjoined with "OR"). One possible solution would be to create a temporary table (just one column is needed), load it with the target values, and set the query to something like:

    SELECT sum(numlog) FROM table1 WHERE date IN (SELECT date from temptab +le)
    Don't forget to drop (or at least clear) the temp table when you're done. And bear in mind that you would need to be very careful if this is being done as part of a web app. (You don't want different clients using the same temp table, and you don't want too many clients creating lots of big temp tables at the same time...)

    You are using a current version of MySQL, aren't you? The ability to do a sub-query select statement like that has been around since the transition to version 5.

Re: Perl and MySQL woes
by Gangabass (Vicar) on Jan 13, 2008 at 08:55 UTC

    Why don't you use DATE type for date data? Do you use index for this field?

Re: Perl and MySQL woes
by aquarium (Curate) on Jan 13, 2008 at 21:49 UTC
    a subselect causes a full table scan in most databases
    some databases will do a full scan if the where condition looks too complicated (for the db engine.) wherever possible make sure that values in tested column(s) don't need casting for comparison, and simplify the condition as much as possible. In this particular case, instead of >=20071204 can be written as >20071203.
    i suggest a great little book "SQL for smarties" by Joe Celko
    the hardest line to type correctly is: stty erase ^H

      i suggest a great little book "SQL for smarties" by Joe Celko

      This book is a great suggestion, but having just updated this week to the latest edition, it's not so little anymore...


      We're not surrounded, we're in a target-rich environment!