Re: Speed of MySQL DBs
by graff (Chancellor) on Feb 06, 2006 at 05:00 UTC
|
The situation that you describe raises a few questions:
-
If new tables are created every time this script is run, how does the script come up with a set of distinct names for each new set of tables?
-
How much data is the script loading into each new set of tables?
-
Is this database "write only", or is there some other process that reads from all these tables that are being created? How does the querier manage all those tables? How does it decide which set of tables to query on a given occasion?
-
What if the querier wants to integrate data across two or more runs of the table-creator script? How can you combine data from different runs if they are in many distinct, unrelated sets of tables?
-
Assuming that each set of newly created tables is identical every previously-created set (except for having different names assigned to the tables), wouldn't it be better to have a single set of tables, which this script would simply re-use (add to) on every run?
Regarding that last item, it would seem easy enough to take whatever method you are using at present to invent distinct table names, and instead have a single set of tables where that extra chunk of information is stored into the rows that are created on a given run. In other words, instead of this:
run1_tbla: (fld1, fld2, fld3, ...)
run1_tblb: (fldx, fldy, fldz, ...)
run2_tbla: (fld1, fld2, fld3, ...)
run2_tblb: ...
You could just have this:
table_a (run_id, fld1, fld2, fld3, ...)
table_b (run_id, fldx, fldy, fldz, ...)
Maybe "run_id" could be something like the date/time of the run, or whatever. | [reply] [d/l] [select] |
|
|
Hi and thanks for your response,
I'll try and answer your questions :
- The script uses id's from a "master index" table to assign a name for each of the new tables created.
- The script is loading in a lot of data, sometimes millions of records - hence my reasoning in splitting up the tables into smaller ones (and answering your 5th point).
- The db is not write only. It is queried by other scripts which also uses the id's from the "master index" table to determine which set of tables it should be using.
- I want to prevent combining the data from each search (also another reason to keep the data in seperate tables).
- see points above :)
My whole aim is to try and speed up the mysql side of things. I have achieved this so far by lessening the amount of data in the tables (i.e. instead of one table with 1,000,000 records - 900,000 of which are not needed for the specified query - i've split it up into 10 tables with 100,000 records each).
Hence that has led me to my initial question - will too many tables slow down the db and if so, would it be quicker to use seperate databases?
I guess also seperate db's would have the advantage of better security to avoid mixing the search data in any way.
| [reply] |
|
|
Given the added detail, I agree that one big table is not as good as a bunch of smaller tables. As to whether a lot of tables is better or worse than a set of distinct databases ... well, there is a whole chapter (7) in the MySQL manual about optimization. Have you looked at that yet?
I don't know off hand whether mysql's data files are "one per table" or "one per database" -- and I'm not sure if that even makes a difference for performance.
If you aren't familiar with the science and art of creating indexes on fields that are often used in the "where" clause, maybe it would be more effective to study that before trying "separate databases vs. separate tables in one database". Indexes are pretty easy to add to a running database, and can have a dramatic effect on query response time. (UPDATE: Then again, if you have other good reasons for using separate databases, go ahead -- I doubt it would damage performance at all, and might even help.)
| [reply] |
|
|
|
|
|
|
I think that the multi table approach is a bad one. MySQL's single unique key lookups are really fast, and you can create a single ID by concatenating the keys in sorted order with their values.
While 10 tables (and thus 30 files in MySQL) is not a lot, it is better to avoid multiple tables.
Also you have the problem of cache expiry - because you should go through and clear out the expired data. This adds a level of complication where you first have to figure out what tables you have then expire the data in all of those.
MySQL works very well with large tables - just do an EXPLAIN SELECT with a single unique key and you will see how little work it has to do to return your required results.
| [reply] |
Re: Speed of MySQL DBs
by helphand (Pilgrim) on Feb 06, 2006 at 02:42 UTC
|
Why is it creating these tables? What's the underlying reason?
I have lot's of search scripts, none of them create any tables, they search existing tables, so I'm having trouble understanding what you are doing.
Scott
| [reply] |
|
|
A simplistic explanation is:
Its doing web-type searches and storing the information it finds in mysql tables...
| [reply] |
|
|
Without more info, it's going to be hard for anyone to give you good advice. Assuming you are using isam tables, MySQL keeps each database in a separate directory and each table as a separate file in the database's directory. If you have thousands of tables, you may run into issues depending on the filesystem.
Scott
| [reply] |
Re: Speed of MySQL DBs
by BUU (Prior) on Feb 06, 2006 at 03:06 UTC
|
Even if you're storing information in your database you should *never* be dynamically creating tables. The database is not designed for it. It's a bad idea. Store your information in pre-existing tables. | [reply] |
Re: Speed of MySQL DBs
by hesco (Deacon) on Feb 06, 2006 at 09:48 UTC
|
MySQL creates a distinct directory for each database you create. For each table you create, it creates three files, Two remain rather small and I believe handle the table structure, the third grows with the data. I once helped build what started out as a database of 1,590 tables. We broke it down to 159 databases of ten tables each and got far better performance. Unlike postgres, you can actually backup and restore the filesystem files, though using the dump function gives you a more portable snapshot of the data.-- Hugh | [reply] |
|
|
Huh? You can too back up PostgreSQL's files on disk, as long as the database itself isn't running (I'd presume MySQL would have the same limitation). If you want to back/restore up while the database is running, pg_dump/pg_restore are your friends.
| [reply] |
|
|
That may be. But I had been given to understand that restoring such data to a new installation of pg would create issues related to the oid's. Consequently I have stuck to the dump and restore methods, which recreate the oid's when the data is re-inserted into the database tables.
But I have been able to restore an installation and preserve the existing passwords without requiring a reset, by copying the /var/lib/mysql/mysql directory and dropping it into a new instance (of the same version). /var/lib/mysql being the datapath for the server and the mysql directory inside it storing the user, password and privilege data for the installation.
-- Hugh
| [reply] |
|
|