philou has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I have quite a bit of data to organize (about 10Mb of text, and 30Mb of binary data), and I am about to make my choice concerning the database type I am going to use.
I know quite well the Berkeley database and Mysql (both accessible through perl modules), but I was wondering which one would be the most efficient.
I appreciate Mysql, because of its command line and SQL language, but all my searchs will be hard coded (say something like grep), and I would prefer not to have to run a permanent daemon.
So my question is, will BerkeleyDB be as performant as Mysql in terms of speed for the kind of data I have to handle.

Thanks,

philou

Replies are listed 'Best First'.
Re: SQL vs Berkeley
by merlyn (Sage) on Feb 11, 2002 at 11:47 UTC
    The amount of data isn't the problem here. It's how you'd be using it. If you can get it into a "hash" model (one key, one row of values), then a DBM is just fine. However, if you'll ever want to "index" some other value in the tuples, you might as well get into the full-on database instead.

    -- Randal L. Schwartz, Perl hacker

Re: SQL vs Berkeley
by LukeyBoy (Friar) on Feb 11, 2002 at 15:16 UTC

    merlyn is totally right, but lemme elaborate. Looking at your types of data (large amounts of text and binary) the Berkeley system is going to be way too small for what you're working with. Think of the Berkeley DB as a hash table that's stored to disk; it's great for some things, but comparing it to a relational database is like comparing a car to an F-16.

    With MySQL you'll be able to store massive amounts of text data in TEXT columns with virtually no limit - you can do the same for binary data using BLOB. Be warned though - coding for MySQL using the DBI driver is a lot trickier than using the Berkeley DB interfaces.

      I would concur with other replies in that MySQL will probably be the best choice for anything other than simple data requirements.

      However, I would differ in my estimation of how much more difficult it will be than working with DBM. Once you get going, it's really not that hard at all.

      My recommendation would be to find someone who has worked with Perl and MySQL before (there are loads of them out there -- check a local Perl Mongers group if you don't have a friend or friend of a friend who can help or send me an e-mail if you are really stuck), have them help you set up a database, and run a sample query.

      It would take about an hour to step through it all from downloading MySQL to getting the results of your first sample query. And using MySQL will give you more real-world practical experience that you can apply later to more complex problems.

      Hope this helps :-)

      I'm not sure that I would call MySQL an F-16. More of a cropduster.
      ()-()
       \"/
        `
Re: SQL vs Berkeley
by trs80 (Priest) on Feb 11, 2002 at 19:00 UTC
    I was intrigued by this post because you seem to prefer the use of the BerkeleyDB and are only considering the MySQL method if you are given compelling enough reason, or at least that is what I inferred.

    We however are not given enough information in my opinion to give you a compelling reason in either direction. You express concerns about running the MySQL daemon, but you really don't get into the nature of the data you are working with.
    The lifespan, stability, and interaction over the lifespan would effect what is the better way to store the information.

    That said, I would recommend the MySQL storage method because I feel/think it offers less/easier modifications during the data lifespan for most record oriented material.