The Never-ending Database

psypete has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: The Never-ending Database by Abigail (Deacon) on Jul 11, 2001 at 15:34 UTC
Well, this isn't really a Perl question, it's more a general programming question. My first question is, why not use an existing solution? Surely that's far easier than rolling your own. dbm files is the first thing that springs to mind. Or an RDBM, like Postgres, Oracle or Sybase. The classical way of course to solve the problem is to use an index. Often, B-trees are used (search the literature about them, there is lots of information about B-trees) but other schemes are possible too. But I guess you already knew that. If you are developing a database, you certainly did research? But then, you wouldn't have to ask this question.... -- Abigail	[reply]
Re: The Never-ending Database by arhuman (Vicar) on Jul 11, 2001 at 13:29 UTC
<NON RDBMS EXPERT'S OPINION> For fixed length records you should probably manage an index and then use seek or sysseek to reach the correct record in your file (avoiding to scan the entire file...) For variable length record I would use fixed sized 'chunk' who would contain variable length records... Hence you wouldn't have to scan the whole file but rather only a 'chunk' to find your record... (The fixed 'chunk allow you to use an optimized structure with fixed size bloc, indexs and seek...) But you can 'feel' it isn't 'serious'... IMHO any decent RDBMS should use already well theorized structure Like BTree or B+Tree (where data and indexes are mixed in the same file!)... </NON RDBMS EXPERT'S OPINION> "Only Bad Coders Code Badly In Perl" (OBC2BIP)	[reply]
Re: The Never-ending Database by MZSanford (Curate) on Jul 11, 2001 at 15:35 UTC
I think arhuman definitly has the right idea. If you are looking for speed and want to save memory by not loading the entire file into memory, saving an index and using seek() is definitly going to be a good bet. What you may want to do is save the database file sorted by the primary keys, and use the Binary Search Algorithm's from Mastering Algorithms with Perl, this could reduce the amount of data in the index by allowing faster searches through the data. Another thing to keep in mind is some sort of query optimization. By this i mean requests need to be screened to see if the will require a complete table scan, and in the case it may be worth reading in the entire file (rather than line-by-line with a seek, you could make it one huge sequential IO operation). Well, not too much new in this, but just some food for thought. may the foo be with you	[reply]
Re: The Never-ending Database by pmas (Hermit) on Jul 11, 2001 at 17:03 UTC
I am guessing you are _not_ an expert programmer. I have nothing against reinventing the wheel, if you know what are you doing. My guess is this is not your case. So, my advice will be to use some standard module. For database based on flat files, you may want to check DBD::CSV and DBD::File. They implement all features you are aware now, and then some more. Available on CPAN mirror near you, or on search.cpan.org. And remember, hardly anybody will ever want to use your database, if it is not compliant with DBI standard. It is good for you having ambition, but IMO less ambitious project will better serve your goal to learn perl. If I have time to spare, I'll go for i.e. Daft text adventure project, which is BTW also using (existing free) database. Hubris might be a virtue for a perl programmer - but not always. There are situations, where perl monks is better served by being humble and learn from experts. It's your call to decide what is better for you. Just my $0.02. pmas To make errors is human. But to make million errors per second, you need a computer.	[reply]
Re: The Never-ending Database by bikeNomad (Priest) on Jul 11, 2001 at 19:03 UTC
BerkeleyDB can allow you a lot of flexibility and speed without exposing you to SQL. If you don't have relations between multiple tables (and aren't going to later), it's may not be worth learning about DBI and SQL.	[reply]