tomazos has asked for the wisdom of the Perl Monks concerning the following question:

I want to construct a basic user database on our linux server and have interactions with it via web/mail perl scripts.

My requirements go beyond the basic hash-storing file provided by the DBM routines, but are not so advanced as to require mSQL/MySQL or similar.

Basically I want a file that stores a "two dimensional hash" with the main key being the user's email address, and secondary keys being things like name, serial number, status etc. I want to be able to add/delete secondary keys easily later.

I was thinking to just build this on top of DBM by serializing the secondary hash for each record, but surely someone else must have been here before and has code I can reuse?

Anyone know of anything that may address these needs?

Replies are listed 'Best First'.
Re: Basic Database?
by jeroenes (Priest) on Jun 11, 2001 at 17:56 UTC
    I don't know of any ready-to-go solution. The *SQL way may be not so bad after all. The interfaces are nice, and the performance is a non-issue with modern systems. But if you insist, it's not difficult to implement. Just use a pipe (or any character that is garanteed to be not used in the fields) to make strings of the 2nd-level info.
    use Some_DB_that_you_like; tie %db Some_DB_etc; my $two_level_hash = $what_ever_your_data; for my $key_one (keys %$two_level_hash){ my @vals = values %{ $two_level_hash->{$key_one} }; my $item = join '|', @vals; $db{ $key_one } = $item; }
    ..to populate your DB; You can fetch items in a similar fashion with split;

    Hope this helps,

    Jeroen
    "We are not alone"(FZ)

Re: Basic Database?
by andreychek (Parson) on Jun 11, 2001 at 18:10 UTC
    I think what may help is MLDBM. From the README:
    This is the README file for MLDBM, the Perl module that can be used to store multidimensional hash structures in tied hashes (including DBM files).

    I can't actually check the documentation because <whine> CPAN is down </whine>, but I believe it uses Data::Dumper to store Perl data types in the DB. Good luck!
    -Eric
      Yes, the MLDBM does use Data::Dumper to store complex multi-level data structures. This module is also covered (briefly) in the O'Reilly Programming the Perl DBI Book - It is from this, that the following little example for illustration is drawn.
       
      #!/usr/bin/perl -w # # ch02/mldbmtest: Demonstrates storaging complex data structures in a +DBM # file using MLDBM module. use MLDBM qw ( DB_File Data::Dumper ); use Fcntl; ### Remove the test file in case it exists already ... unlink 'mldbmtest.dat'; tie my %database1, 'MLDBM', 'mldbmtest.dat', O_CREAT | O_RDWR, 0666 or + die "Can't initialise MLDBM file: $!\n"; ### Create some megalith records in the database %database1 = ( 'Avebury' => { name => 'Avebury', mapref => 'SU 103 700', location => 'Wiltshire' }, 'Ring of Brodgar' => { name => 'Ring of Brodgar', mapref => 'HY 294 133', location => 'Orkney' } ); ### Untie and retie to show data is stored in the file untie %database1; tie my %database2, 'MLDBM', 'mldbmtest.dat', O_RDWR, 0666 or die "Can' +t initialise MLDBM file: $!\n"; ### Dump out via Data::Dumper what's been stored ... print Data::Dumper->Dump( [ \%database2 ] ); untie %database2; exit;

       
      Another way which I have stored complex data structures before has been through the Storable module - This wonderful module even allowed me to store user preference hashes for a web site in a cookie!
       
      Ooohhh, Rob no beer function well without!
Re: Basic Database?
by Spudnuts (Pilgrim) on Jun 11, 2001 at 18:33 UTC
    I would recommend using an SQL server for a few reasons.
    • file manipulation is somewhat unpleasant, and doesn't scale well
    • a database will make it easier for your servers to be separated (less replication worries) as your organization grows
    • a database makes it somewhat easier to separate your code from your data (access from another language or tool is easier)
    • a database can provide data checking automatically for you
    There are probably many more good reasons.
      One reason using SQL database, rather important IMHO:

      You wil learn something new and important. Market value of your skills will increase. SQL database is standard way to store data in the industry. SQL is portable enough so next time you are ready to store data in ORACLE.

      From other point of view, learning ::Storable or other proposed solutions is great if you are already familiar with SQL databases.

      Obviously I assume you have time to try and learn new tools.

      Given enough time, and if you are not familiar enough with databases, I will recommennd to go SQL database route.

      It will be pain to digest database concepts, learn new (SQL) language, but it will benefit you immensely. Database itself looks simple and straightfooward (just one table), excellent for your first project. And maybe you can thought out some related table and JOIN them. This allows you to join SQL DB club... : )

      pmas


      To make errors is human. But to make million errors per second, you need a computer.
Re: Basic Database?
by gildir (Pilgrim) on Jun 11, 2001 at 19:14 UTC
    There are two possible solutions:

    *DBM
    Use dbm file with embeded data structures serialized. You may use MLDB for this purpose, but as it uses Data::Dumper it could be somewhat slow for your purposes (see Serialization uncovered). But the real pain in this are indexes. *DBM allow you only one key. You must maintain several other *DBM files with mappings of secondary keys to primary keys. And you must maintain them consistent. And if you want to do a substring search, there is no easy method except for linear search. And that could be terribly slow even for small database. One more problem with *DBM files is parallel access. You cannot open *DBM file for writing in two precesses. And if you do open/close scenario, data cannot be cached and the access is very inefficient.

    database/directory
    Use RDBMS (Oracle,MySQL,...). Choose some lighweighted one and it will have little impact on performance compared with programming comfor you gain. Or even better use a directory service, like LDAP. There is excelent open-source LDAP server: OpenLDAP. This will do all the dirty index/schema/caching things. There are at leas two modules on CPAN for LDAP access (Net::LDAP and perldap) and both works fine for me.

      Only one comment:
      And if you want to do a substring search, there is no easy method except for linear search. And that could be terribly slow even for small database.
      In DB_File under the section Matching Partial Keys we see:

      Matching Partial Keys

      The BTREE interface has a feature which allows partial keys to be matched. This functionality is only available when the seq method is used along with the R_CURSOR flag.
      $x->seq($key, $value, R_CURSOR) ;
      Here is the relevant quote from the dbopen man page where it defines the use of the R_CURSOR flag with seq:
      Note, for the DB_BTREE access method, the returned key is not necessarily an exact match for the specified key. The returned key is the smallest key greater than or equal to the specified key, permitting partial key matches and range searches.
      In the example script below, the match sub uses this feature to find and print the first matching key/value pair given a partial key.
      use strict ; use DB_File ; use Fcntl ; use vars qw($filename $x %h $st $key $value) ; sub match { my $key = shift ; my $value = 0; my $orig_key = $key ; $x->seq($key, $value, R_CURSOR) ; print "$orig_key\t-> $key\t-> $value\n" ; } $filename = "tree" ; unlink $filename ; $x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0640, $D +B_BTREE or die "Cannot open $filename: $!\n"; # Add some key/value pairs to the file $h{'mouse'} = 'mickey' ; $h{'Wall'} = 'Larry' ; $h{'Walls'} = 'Brick' ; $h{'Smith'} = 'John' ; $key = $value = 0 ; print "IN ORDER\n" ; for ($st = $x->seq($key, $value, R_FIRST) ; $st == 0 ; $st = $x->seq($key, $value, R_NEXT) ) { print "$key -> $value\n" } print "\nPARTIAL MATCH\n" ; match "Wa" ; match "A" ; match "a" ; undef $x ; untie %h ;
      Here is the output:
                 IN ORDER
                 Smith -> John
                 Wall  -> Larry
                 Walls -> Brick
                 mouse -> mickey
      
                 PARTIAL MATCH
                 Wa -> Wall  -> Larry
                 A  -> Smith -> John
                 a  -> mouse -> mickey
      
      So, although I don't know how it is that DBM implements that internally, it does seem like it would give you a little bit more functionality (or at least a more elegant interface) than just a linear search. Comments?
        The returned key is the smallest key greater than or equal to the specified key

        That will allows you to search for foo*, but not for *foo nor *foo*. Substring search is *foo*-like, but you are right about the 'added functionality'. In some cases even foo* search can be sufficient.

Re: Basic Database?
by bikeNomad (Priest) on Jun 11, 2001 at 19:31 UTC
    BerkeleyDB supports transactions and locking. You can also have non-unique keys. And, as others have suggested, you can use something like Storable to serialize the contents of the hashes you store if you need to.

    Joins (though the underlying Sleepycat BerkeleyDB supports them) are not yet supported (though the manpage says that support is in progress). However, you could use the fine-grained locking and multiple tables to do joins manually.

Re: Basic Database?
by extremely (Priest) on Jun 12, 2001 at 06:20 UTC
    If you are on windows or really want it to remain human readable text files then I can highly recommend you try out DBD::CSV under DBI. You'll have to learn a bit of SQL but you wind up with a database that is all text and simple to peek into. As a benefit you do wind up learning how simple DBI is to get into which can only help you on the road you are about to head down.

    --
    $you = new YOU;
    honk() if $you->love(perl)

      Um, my own 2 cents but I'm not sure about "highly". It is cool but my experience was not so hot. On the other hand I have had wonderful experience with Mysql.

      I actually was going to write a comment saying, "Don't use DBD::CSV. You will go insane. Okay?".

      It actually is quite nice and I was delighted with DBD::CSV at first. But I found wierd bugs crawling in and though I still have a project using it, find it a little fragile and unscalable. True, I implemented it soon after it came out so maybe it is much better now. But you also want to be careful with things that could break CSV, especially (sounds dumb but..) vertical tabs from Excel. In fact, Excel is evil. View something in Excel if you like but if you must edit in Windows, use Access! Excel CSV generation is broken.

      I suppose Windows vs. Unix and also doing it all in Japanese didn't help so maybe it was a bit of a stress test. I also hacked SQL/HTML v1.0 by Jonathan A. Zdziarski to work with DBD::CSV for a quick internal admin interface to the db, which was pushing things a bit. Also if you think you have a unique key you will have to strongly enforce its uniqueness every step of the way. That said, it works for very basic things. Best use of it I would think is if you periodically received a CSV file and wanted to deal with it in SQL. I really enjoyed how easy it was to write the search algorithms with SQL. But keep backups and go look at and possibly edit the raw file periodically please.

      That said, DBD::CSV does work and you can see it working in the event calendar of the Royal Netherlands Embassy in Japan. (The free text engine is different algorithm, not SQL).

      There is another thing, you could have some flat file or other which you slurp into memory and then manipulate with DBD::RAM. I keep trying to find some reason to try working with it.. Hey I just installed it from cpan, it's back up again! I wonder which is faster, DBD::CSV or slurp + DBD::RAM ?..

      Running test.pl from DBD::CSV and then from DBD::RAM (edited). Testing empty loop speed ... CSV 100000 iterations in 0.2 cpu+sys seconds (476190 per sec) RAM 100000 iterations in 0.2 cpu+sys seconds (416666 per sec) Testing connect/disconnect speed ... CSV 2000 connections in 1.6 cpu+sys seconds (1265 per sec) RAM 2000 connections in 1.3 cpu+sys seconds (1503 per sec) Testing CREATE/DROP TABLE speed ... CSV 500 files in 1.8 cpu+sys seconds (273 per sec) RAM 500 files in 1.0 cpu+sys seconds (480 per sec) Testing INSERT speed ... CSV 500 rows in 1.1 cpu+sys seconds (450 per sec) RAM 500 rows in 0.6 cpu+sys seconds (793 per sec) Testing SELECT speed ... CSV 100 single rows in 8.7 cpu+sys seconds (11.5 per sec) RAM 100 single rows in 6.7 cpu+sys seconds (14.9 per sec) Testing SELECT speed (multiple rows) ... CSV 100 times 100 rows in 10.7 cpu+sys seconds (9.4 per sec) RAM 100 times 100 rows in 8.8 cpu+sys seconds (11.3 per sec)

      Updated:Yow! Turns out DBD::RAM works with not only local CSV files, but also files accessible via http and ftp, and it even works on arrays and hashes? Woohoo! I think I like the idea of using DBD::RAM better than DBD::CSV because I can migrate data to a different format if I decide it needs to be binary, and I don't try to fool myself thinking my flat file is an SQL database engine. Seems you can separate the SQL query interface and the data I/O to some degree. Maybe a separate perl process to maintain that in memory plus IPC? Interested in anyone with experience with RAM.

      Funny, it says in the man page that you can have it write to disk for every modification of the db. I wonder if that would satisfy some people who are worried about Mysql putatively not saving data quickly.

Re: Basic Database?
by Anonymous Monk on Jun 11, 2001 at 23:45 UTC
    I use MLDBM to store a fairly nested hash and it works rather well. Two points to consider:

    1. USE STORABLE not Data::Dumper!!! Storable is about an order of magnitude faster for my tasks. The serialization package that MLDBM uses is customizable. See the docs.

    2. Consider your memory needs. I've found that not only does the serialization package incur a signifigant memory overhead (Data::Dumper being the worst offender as far as I can tell), MLDBM adds on another 10-20%.

    The widom to consider SQL is valid. Take a look at DBIx::Recordset for a module that will allow you to work with the your database using native Perl hashes and not have to mess with DBI or SQL ugliness.
Re: Basic Database?
by mr.dunstan (Monk) on Jun 12, 2001 at 02:45 UTC
    Using an SQL database will make this SO much easier!! If you don't know sql, mysql, etc this might be painful at first, but you will reap the rewards later on!! If you have the time to spend on it, go with My or Postgres and get ready to rumble ... uh, well, not rumble, but have an easier time.