psypete has asked for the wisdom of the Perl Monks concerning the following question:

please dont yell at me for stupidity, for if you do it will be for stupidity's sake. i have been developing a small database and have run into a bit of a snag. i want to be able to read in parts of a database (records) without reading in the whole file. i also want it to be able to expand as much as possible. so, it cant be fixed-length. how would i do this? i can read in parts of a database if it's fixed-length records, and i can pretty much scan for one record in the entire file by reading it into memory, but these arent very good alternatives to my exponentially-expanding memory-easy database.

if anyone has a suggestion/code excerpt i would be so happy to get it i would give you props and a quick sponsor in my source code. i am developing the FreeBASE, and you can message or email me for the source code URL.

Replies are listed 'Best First'.
Re: The Never-ending Database
by Abigail (Deacon) on Jul 11, 2001 at 15:34 UTC
    Well, this isn't really a Perl question, it's more a general programming question.

    My first question is, why not use an existing solution? Surely that's far easier than rolling your own. dbm files is the first thing that springs to mind. Or an RDBM, like Postgres, Oracle or Sybase.

    The classical way of course to solve the problem is to use an index. Often, B-trees are used (search the literature about them, there is lots of information about B-trees) but other schemes are possible too.

    But I guess you already knew that. If you are developing a database, you certainly did research? But then, you wouldn't have to ask this question....

    -- Abigail

Re: The Never-ending Database
by arhuman (Vicar) on Jul 11, 2001 at 13:29 UTC
    <NON RDBMS EXPERT'S OPINION>
    For fixed length records you should probably manage an index and then use seek
    or sysseek to reach the correct record in your file (avoiding to scan the entire file...)

    For variable length record I would use fixed sized 'chunk' who would contain variable length records...
    Hence you wouldn't have to scan the whole file but rather only a 'chunk' to find your record...
    (The fixed 'chunk allow you to use an optimized structure
    with fixed size bloc, indexs and seek...)

    But you can 'feel' it isn't 'serious'...
    IMHO any decent RDBMS should use already well theorized structure
    Like BTree or B+Tree (where data and indexes are mixed in the same file!)...
    </NON RDBMS EXPERT'S OPINION>


    "Only Bad Coders Code Badly In Perl" (OBC2BIP)
Re: The Never-ending Database
by MZSanford (Curate) on Jul 11, 2001 at 15:35 UTC

    I think arhuman definitly has the right idea. If you are looking for speed and want to save memory by not loading the entire file into memory, saving an index and using seek() is definitly going to be a good bet. What you may want to do is save the database file sorted by the primary keys, and use the Binary Search Algorithm's from Mastering Algorithms with Perl, this could reduce the amount of data in the index by allowing faster searches through the data.

    Another thing to keep in mind is some sort of query optimization. By this i mean requests need to be screened to see if the will require a complete table scan, and in the case it may be worth reading in the entire file (rather than line-by-line with a seek, you could make it one huge sequential IO operation). Well, not too much new in this, but just some food for thought.


    may the foo be with you
Re: The Never-ending Database
by pmas (Hermit) on Jul 11, 2001 at 17:03 UTC
    I am guessing you are _not_ an expert programmer. I have nothing against reinventing the wheel, if you know what are you doing. My guess is this is not your case. So, my advice will be to use some standard module.

    For database based on flat files, you may want to check DBD::CSV and DBD::File. They implement all features you are aware now, and then some more. Available on CPAN mirror near you, or on search.cpan.org.

    And remember, hardly anybody will ever want to use your database, if it is not compliant with DBI standard.

    It is good for you having ambition, but IMO less ambitious project will better serve your goal to learn perl. If I have time to spare, I'll go for i.e. Daft text adventure project, which is BTW also using (existing free) database.

    Hubris might be a virtue for a perl programmer - but not always. There are situations, where perl monks is better served by being humble and learn from experts. It's your call to decide what is better for you. Just my $0.02.

    pmas

    To make errors is human. But to make million errors per second, you need a computer.

Re: The Never-ending Database
by bikeNomad (Priest) on Jul 11, 2001 at 19:03 UTC
    BerkeleyDB can allow you a lot of flexibility and speed without exposing you to SQL. If you don't have relations between multiple tables (and aren't going to later), it's may not be worth learning about DBI and SQL.