wink has asked for the wisdom of the Perl Monks concerning the following question:

I apologize for the broadness of this question. I'm hoping for some opinions from the monks on the most efficient data storage method for some scripts I'm writing. Descriptions are as follows:

1. The data is a listing of network switchports and their characteristics. It's currently stored in a text file that uses 1 line for each port. The script that currently runs over it stores each line in a multilevel hash, and I'd like to keep that format (something like $ports{switch}{interface}{x|y|z}). I was thinking of using MLDBM, of course, but what underlying DB? There will be a frequent cron job that reads in the data, updates it, and writes it out, as well as some CGI apps that are run by users, mostly to read, but occasionally to write changes. I was thinking maybe GDBM, since it apparently offers file locking, but I can also just use a semaphore file with flock. Would any type of DB have an advantage over others?

2. This one is a bit more tricky. It's going to store a large number of entries in multiple databases. Most of the data is composed of large text fields (1 field in particular is usually a sizeable paragraph). A number of scripts will read from this and may have to parse over different parts of the data. I was wondering if an SQL DB with multiple tables would be best for this, or if some of the DB modules would handle this okay (as a small side note, I don't know SQL, but learning it is not an issue if it would be the best option).

Thanks in advance for your advice!

Replies are listed 'Best First'.
Re: Which databases to use?
by jZed (Prior) on Aug 17, 2005 at 20:04 UTC
    If you are going to stick with a DBM format, you are probably best off going with BerkeleyDB since it can easily handle large text fields and has a variety of locking options. You might also look at DBM::Deep for an alternative. OTOH, if you use a relational database system you can leverage on the many modules already created to do things like the CGI interaction, locking, etc. Going with a relational database doesn't necessarily mean using SQL since any DBI accessible database can be accessed with multi-level hashes or objects with modules like Tie::DBI, AnyData, or Class::DBI. Learning SQL won't hurt either your career prospects or your code reuse (though it may warp your mind). If you need to transition your text file to a relational database, modules like DBD::CSV and DBD::AnyData can make that pretty painless.
Re: Which databases to use?
by zentara (Cardinal) on Aug 18, 2005 at 13:23 UTC
    DBD::SQLite would be great too. It has basic sql abilities, plus it's a fast standalone program.... no server to run. Also the Perl module interface to it is standalone, so you don't need the separate SQLite binary installed( although it is useful for testing)

    I'm not really a human, but I play one on earth. flash japh
Re: Which databases to use?
by dragonchild (Archbishop) on Aug 18, 2005 at 15:07 UTC
    jZed is absolutely correct - DBM::Deep is an excellent option if you're looking at a DBM-type solution. This is also good for most SQLite solutions as SQLite doesn't really provide the "relational" part of an RDBMS. (IMHO, it's just a less-featureful implementation of SQL::Statement in C.) DBM::Deep would easily solve both of your needs.

    Now, your next question will be "When should I use an RDBMS (like MySQL or PostgreSQL) vs. DBM::Deep?". That's a really good question. The answer, luckily enough, is very simple. Use an RDBMS when you have separate parts of your data that have relationships. That's what the R in RDBMS stands for. Otherwise, all you need is a DBMS, which is what DBM::Deep is.

    As for SQL, you can have SQL query pretty much anything you want - that's what DBD::AnyData is for. :-)


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      Thanks! Your reply was very informative. I suppose I'm still a bit confused as to what constitutes "relational" in a database. For instance, in example #2, it's a database of academic questions (ever heard of Quizbowl?) that include the main category, subcategory, type of question, question, answer(s), and the author. I will want to do a number of different lookups: by category (or subcategory), by author, grab all the answers (so we can avoid duplicates)... I don't know exactly why, but this seems like a possible candidate for the RDBM, but I'm really not familiar enough with them to decide. :-/

        Are some of the question multiple choice questions? If so you have relational data : there is a one to many relationaship between a question and the multiple possible answers to that question. Do you need to track information about authors? If so you have relational data since there is a one to many relationship between one author and the multiple questions they have written.

        But whether the data fits the relational model is not, IMNSHO, the only thing to consider. You also need to consider who is going to write the code that does things like adding new data, removing old data, querying data, displaying data result sets in various formats, etc.

        These operations are standard operations for RDBM systems regardless of the complexity of the data. Therefore there already exist many solutions and you don't need to re-invent inserts, update, delete, display, etc. yourself. Do those solutions exist for other non-RDBMS systems? Well they can certainly be recreated with Perl, anything can. But I doubt you'll find the breadth of solutions already created in the RDBMS world. In Perl RDBM systems are primarily accessed through DBI which presents a standard interface. This means that there are many people who are familiar with how to do all of those operations and that there are many modules that allow you to do those operations in a coding style you prefer.
        Note: there is a certain amount of overhead associated using the relational part of an RDBMS. Very often, if you only have a small dataset, say under 1000 rows of data, the relational overhead is often not worth it and it's easier just to use a DBM such as DBM::Deep.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Which databases to use?
by Anonymous Monk on Aug 18, 2005 at 16:21 UTC