Is (20 million) records a problem for mySQL?
No. However, you (or your DBA) should spend some time planning and benchmarking regarding appropriate storage engines, indexes and backup strategy.
Which perl module will be the fastest and simplest to use?
If you know SQL already and are happy to write that part yourself then DBI would be an obvious first choice.
| [reply] |
Q1: Many Monks utilize mySQL for projects from trivially small to gigundo. Though I am not one using large data sets, I suspect you'll have no problem. But an authoritative answer will have to come from another responder... and frankly, I suspect you'll do better and/or quicker by posing Q1. on a site specializing in mySQL or, perhaps, by reading mySQL's online docs.
Q2: DBI and DBD::mysql make a tag-team that should make your work fast and simple.
Welcome to PM... but strongly suggest you read On asking for help and How do I post a question effectively? and try an outside search engine or Super Search before posting such a broad question with (implicit) answers all over this site.
Update: changed links Q2 answer to aliases for readability.
| [reply] |
MySQL can work with that size of data volume easy enough. You may want to ask the hosting company however if they do not have size limits imposed on your account.Access to any kind of database should be done through DBI and in this case DBD::MySQL, either directly or through (for instance) DBx::Class.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] |
Historically, people who have chosen MySQL have typically done so because it was open source and freely available. Now that MySQL is owned by Oracle, and (for reasons I haven't taken the time to look into) a "separately owned fork" of the MySQL code base has been established, known as MariaDB, a number of folks are switching over to the latter, which is (currently) "plug-and-play" compatible in every regard with MySQL. I point this out just in case you're at a stage of deciding which RDB package to install on the CentOS-7 server (and in case the finer points of open source licensing might be an issue for you).
Another alternative that is also open source and freely available is Postgres, which differs significantly from MySQL in terms of the "options" and "extensions to 'vanilla' SQL" that are offered. Different folks have different opinions about the relative merits of Postgres and MySQL/MariaDB, and both camps have some sensible reasons for their preferences (e.g. because of the particular extensions offered by one or the other, the relative ease of maintenance and back-ups, etc.).
Both RDB systems are well documented and fully capable of handling the dimensions you're looking at, but when you write SQL schemas and queries for one, there's a non-null (and non-trivial) chance that you'd have to change a number of details (in both schema definitions and production code) if you later decided to switch over to the other. So, if you're still at the planning stage, and have the choice, you'll want to check out both and "choose wisely" for your particular application.
Both systems are fully supported by DBI and DBD modules (and as a result, all the supplemental DBI-related modules work equally well with both systems). To the extent that your sense of "fastest and easiest to use" is a matter of being sure that perl scripting can be sensibly compact and efficient, there's no cause for worry with either DB.
Of course, to the extent that "fastest and easiest to use" could mean "ability to get something running with the minimum of study and planning," things could also go remarkably badly with either system. Knowing (vs. not knowing) how and when to use additional, non-default indexes, how and when to use transactions and commits, etc., can lead to orders-of-magnitude differences in performance, no matter which database engine you choose. | [reply] |