Re: Basic data storage question

Depends on what you want to do.

If you're using the title to select one abstract, and then running regexps on just one abstract, then depending on how many abstracts you're dealing with, there's probably a better way.

If you're running regexps on each one of the abstracts in sequence and that's all you do, flat files are about as fast as you get, IMHO.

Reading your mind between the lines, I rather suspect the former rather than the latter, in which case for a sufficiently large number of articles (I dunno, a thousand or more?), you can speed things up by an order of magnitude or more.

If that's so, then you likely want to separate things into a list of authors, a list of journals, and a list of articles with pointers to authors and journals. A setup like this would let you query articles by author or journal, for instance.

If that appeals to you, then a database may be in order. Popular open source database packages include:

MySQL: the speed king
PostgreSQL: also really good
SQLite: good for small applications

Perl Modules for database access start with the ubiquitous DBI, ranging all the way up to relational DB backed object frameworks like SPOPS and Alzabo. My personal favorite for intuitive ease of use is Class::DBI, which maps object classes to database tables on a one-to-one basis. It's not perfect (yet) but I find myself more productive using it.

OTOH, you may want to keep things all in a single file as you do now, but speed up your searches, in which case you may want to use something like Berkeley DB.

Comment on Re: Basic data storage question

Replies are listed 'Best First'.
Re: Re: Basic data storage question by Anonymous Monk on Jul 30, 2003 at 03:40 UTC
MySQL: the speed king Since when? According to who?	[reply]
Re: Re: Re: Basic data storage question by cleverett (Friar) on Jul 31, 2003 at 02:49 UTC
Sorry ... don't want to start a religious war.	[reply]