Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks!
I am using Perl for quite some time now, but now it's the first time I need to use it together with a Mysql database.So, I have some questions:
1) I need to store ~10.000.000 entries in a table PEOPLE which will be:
ID (as Primary key), name VARCHAR (20), sequence (TEXT), label (TEXT)
The search in this table must be done according to "name". Will a database with the specs: CREATE DATABASE Testdb DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci; be OK for this amount of entries?
2) Should I use indexes, like:create index name_index on PEOPLE (name); to speed up the search process?
There will be no need for regular inserts or updates in the table, so the only thing is if i am ok as far as the size is concerned to store my data. And also, if you have something to further suggest that will help the searches be more efficient, when, for instance, the user wants to look up the entry #5.809.672 which has the name 'Peter'
Thank you!

Replies are listed 'Best First'.
Re: Questions on Perl & Mysql
by timray (Initiate) on Jun 26, 2010 at 09:32 UTC
    1) The collation specifies the comparison rules. The size of the database should be irrelevant for this. Performance will depend on the indexing 2) You should most definately add an index for the primary key as this will dramatically improve performance on a large table like this (without an index the table will be scanned looking for the item - ie to find out that an entry is not present it will check for a match on all 10 million records). As you are infrequently/never adding/removing entries you will not be modifying the indexes (which could reduce performance)so you could add further indexes (e.g. on name or combined ID/name). However if the ID is unique you will only ever get 1 record returned so other indexes are unnecessary (unless you want to get entries without specifying the ID). You might also consider using the SDBM module if this is only to be a simple key/value pair (e.g by always accessing via the ID). It is straightforward to tie the database access to a hash table so that access to the SDBM database is controlled as though it is a hash table. SDBM performance is much faster than MySQL lookups. This works very well if it is only a key/value pair.