natol44 has asked for the wisdom of the Perl Monks concerning the following question:

Hello!

I need to manage a database of about 20 million records. Each record has 5 fields, for about 100 octets total (by record)

It will be hosted on a nix server, under Centos-7.

For some reasons I want to do it in Perl. My questions are:

1- Is the number of records a problem for mySQL?

If I count correctly, the db size should be about 2 Go "only" for data.

2- Which perl module will be the fastest and simplest to use? Requests will be very simple: Search on a maximum of 2 fields together.

Thank you for your help!

Replies are listed 'Best First'.
Re: MySQL database with a lot of records
by hippo (Archbishop) on May 17, 2015 at 17:28 UTC
    Is (20 million) records a problem for mySQL?

    No. However, you (or your DBA) should spend some time planning and benchmarking regarding appropriate storage engines, indexes and backup strategy.

    Which perl module will be the fastest and simplest to use?

    If you know SQL already and are happy to write that part yourself then DBI would be an obvious first choice.

Re: mySQL base with a lot of records
by ww (Archbishop) on May 17, 2015 at 17:32 UTC

    Q1: Many Monks utilize mySQL for projects from trivially small to gigundo. Though I am not one using large data sets, I suspect you'll have no problem. But an authoritative answer will have to come from another responder... and frankly, I suspect you'll do better and/or quicker by posing Q1. on a site specializing in mySQL or, perhaps, by reading mySQL's online docs.

    Q2: DBI and DBD::mysql make a tag-team that should make your work fast and simple.

    Welcome to PM... but strongly suggest you read On asking for help and How do I post a question effectively? and try an outside search engine or Super Search before posting such a broad question with (implicit) answers all over this site.

    Update: changed links Q2 answer to aliases for readability.

Re: mySQL base with a lot of records
by CountZero (Bishop) on May 17, 2015 at 22:21 UTC
    MySQL can work with that size of data volume easy enough. You may want to ask the hosting company however if they do not have size limits imposed on your account.

    Access to any kind of database should be done through DBI and in this case DBD::MySQL, either directly or through (for instance) DBx::Class.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: mySQL base with a lot of records
by graff (Chancellor) on May 18, 2015 at 08:52 UTC
    Historically, people who have chosen MySQL have typically done so because it was open source and freely available. Now that MySQL is owned by Oracle, and (for reasons I haven't taken the time to look into) a "separately owned fork" of the MySQL code base has been established, known as MariaDB, a number of folks are switching over to the latter, which is (currently) "plug-and-play" compatible in every regard with MySQL. I point this out just in case you're at a stage of deciding which RDB package to install on the CentOS-7 server (and in case the finer points of open source licensing might be an issue for you).

    Another alternative that is also open source and freely available is Postgres, which differs significantly from MySQL in terms of the "options" and "extensions to 'vanilla' SQL" that are offered. Different folks have different opinions about the relative merits of Postgres and MySQL/MariaDB, and both camps have some sensible reasons for their preferences (e.g. because of the particular extensions offered by one or the other, the relative ease of maintenance and back-ups, etc.).

    Both RDB systems are well documented and fully capable of handling the dimensions you're looking at, but when you write SQL schemas and queries for one, there's a non-null (and non-trivial) chance that you'd have to change a number of details (in both schema definitions and production code) if you later decided to switch over to the other. So, if you're still at the planning stage, and have the choice, you'll want to check out both and "choose wisely" for your particular application.

    Both systems are fully supported by DBI and DBD modules (and as a result, all the supplemental DBI-related modules work equally well with both systems). To the extent that your sense of "fastest and easiest to use" is a matter of being sure that perl scripting can be sensibly compact and efficient, there's no cause for worry with either DB.

    Of course, to the extent that "fastest and easiest to use" could mean "ability to get something running with the minimum of study and planning," things could also go remarkably badly with either system. Knowing (vs. not knowing) how and when to use additional, non-default indexes, how and when to use transactions and commits, etc., can lead to orders-of-magnitude differences in performance, no matter which database engine you choose.

A reply falls below the community's threshold of quality. You may see it by logging in.