in reply to Basic data storage question
If you're using the title to select one abstract, and then running regexps on just one abstract, then depending on how many abstracts you're dealing with, there's probably a better way.
If you're running regexps on each one of the abstracts in sequence and that's all you do, flat files are about as fast as you get, IMHO.
Reading your mind between the lines, I rather suspect the former rather than the latter, in which case for a sufficiently large number of articles (I dunno, a thousand or more?), you can speed things up by an order of magnitude or more.
If that's so, then you likely want to separate things into a list of authors, a list of journals, and a list of articles with pointers to authors and journals. A setup like this would let you query articles by author or journal, for instance.
If that appeals to you, then a database may be in order. Popular open source database packages include:
Perl Modules for database access start with the ubiquitous DBI, ranging all the way up to relational DB backed object frameworks like SPOPS and Alzabo. My personal favorite for intuitive ease of use is Class::DBI, which maps object classes to database tables on a one-to-one basis. It's not perfect (yet) but I find myself more productive using it.
OTOH, you may want to keep things all in a single file as you do now, but speed up your searches, in which case you may want to use something like Berkeley DB.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Basic data storage question
by Anonymous Monk on Jul 30, 2003 at 03:40 UTC | |
by cleverett (Friar) on Jul 31, 2003 at 02:49 UTC |