Text Database

EyesOnly has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Text Database by blakem (Monsignor) on Aug 16, 2001 at 04:33 UTC
Try a Super Search on flat file and you'll find some good posts about the topic. The most common opinion is that your data processing requirements will grow beyond what they are today. Updating and maintaining your flat file solution will become more and more of a pain. Eventually, someone (probably you) will need to replace it with a more robust solution anyway. So, why not start with a more advanced solution now and save yourself the trouble down the line? Believe me its more fun to learn about SQL or DB files, than to kludge another feature onto a home-grown solution. -Blake	[reply]
Re: Text Database by chromatic (Archbishop) on Aug 16, 2001 at 06:02 UTC
It's boring to write delimiter-parsing code all the time, and it's possible the delimiter will appear in a field sometime. You could switch from commas to pipes to other unlikely characters, but the underlying problem remains. If your needs are simple, it's not too bad. I'd quickly switch to MLDBM, though. If you really must parse things, go for Text::CSV or a similar module.	[reply]
Re: Text Database by perrin (Chancellor) on Aug 16, 2001 at 06:05 UTC
You won't have transaction support, but you probably don't need it anyway. You might miss the convenient reporting capabilities of SQL though. Also, if it grows too much it will get slow. You could try migrating to DBD::CSV now, to start getting that DBI flavor, and then that will make it an easier step to MySQL or Postgres. Or you could go to something like MLDBM. Both options scale up much better than plain text.	[reply]
Re: Text Database by clearcache (Beadle) on Aug 16, 2001 at 10:28 UTC
I feel your pain. I, too, have become ridiculously adept at creating text-based databases (among other things, I write custom apps for integration with some "very legacy" products my company still uses for data collection...) My advice (after just doing exactly what you're talking about) is that when you START on a project that you think text files will work just fine for, you need to consider the scope of the project. It's not really something you should be thinking about AFTER you've begun coding...I learned the hard way ;) How much is the project going to grow (in terms of your procedures that work with the data)? If you have to do alot with the records, you're going to run into one of two situations - either you'll take a performance hit opening, iterating through, and closing the files everytime you need to search for some data OR you'll take a performance hit by loading it all into a hash in memory and searching through the index...the hash cuts down on the opening/while-ing/closing time, improves your appending/inserting/modifying procedures, but sucks up overhead. (And there's only so much that compressing the string before it ends up in the hash can do for you...it's not really worth it unless your strings are super-long.) Also consider how much the data is going to grow physcially. Even if you're not doing anything truly wacky with appending/modifying/inserting records, if the # of records is going to grow, you've got management issues...hashed data or not, it can get to be very tedious. Finally, is your project likely to become part of a larger system? If it's stand-alone and flat files still seem like a good idea, go for it...BUT if you're likely to be integrating with other reporting/content-generating software, go with the database. Just about everything can do an ODBC connection. I agree, to a point, with the posters that suggest developing your relat. db skills. That's very cool and all, but there are some situations where you may get better performance out of a flat file...that bears consideration, as well. Nothing ticks me off more than going to a website that is slow because it UNNECESSARILY generates some of its content from a database. Without knowing too much about your project, I would suggest developing those relat. db skills (look into PHP for talking with them!) for future projects and leave the current project alone. If it works and is not going to grow its database to a tremendous size, you should probably be looking ahead at more interesting projects rather than looking back to fine-tune this one.	[reply]
Re: Text Database by jlongino (Parson) on Aug 16, 2001 at 08:18 UTC
It sounds as though what you have works fine. Should you change to more advanced database management techniques? There are several things to consider. Do you have the time? This is probably the determining factor in most cases. If you have the time and it doesn't detract from your regular duties, why not? If not, we all know the old adage "if it ain't broke . . . ". Is the system used for modifying or appending data? Then you would probably want more sophisticated techniques. On the other hand, is it only used for manipulating data to generate reports or create batch update scripts? I have text-based Perl systems that do things like produce class rolls for entire departments, generate unix accounts/forms for multiple class sections and create site e-mail directories combining 6 or more separate hosts. Some of the data files like the /etc/passwd files are stored in colon delimited text files, others have fixed width fields. The point is, they are already in that format, some are downloaded regularly from our IBM mainframe. I don't update those files, I mine them. Some of these systems use as many as ten or twelve different files and undergo complex merges and sorts. The longest any of these programs runs is about 15 seconds tops with the average more like 5 seconds. Any database system that is used regularly is going to increase in either size or complexity. That is when things start to slow down. My systems became more complex but the original data didn't grow in size. I created some perl modules to handle the data parsing, and stored complex data structures to disk (using MLDBM and Storable's nstore). These intermediate files are then used by other program further down the line and load quickly. If time is not an issue, what have you got to lose? Make yourself more marketable by adding DBI, MySql, or Postgres to your resume! If the code and the comments disagree, then both* are probably wrong.* -- Norm Schryer	[reply]
Re: Text Database by trantor (Chaplain) on Aug 16, 2001 at 16:14 UTC
I agree will all the previous posters, and I would also like to point out how painful it can be to handle in place editing (when record size changes) and concurrent access on text files. If you're using text files for intranet purposes, you've most likely already bumped into concurrent access problems. For the time being, it could be worth using an abstraction layer such as DBD::CSV, or rolling your own. Anyway, I'd consider seriously switch to MLDBM or (better, in my opinion, being the Web involved) a proper database. Data format conversion is not a concern, as you can write a Perl script that handles that :-) Happy storing! -- TMTOWTDI	[reply]