Depends on what you want to do.

If you're using the title to select one abstract, and then running regexps on just one abstract, then depending on how many abstracts you're dealing with, there's probably a better way.

If you're running regexps on each one of the abstracts in sequence and that's all you do, flat files are about as fast as you get, IMHO.

Reading your mind between the lines, I rather suspect the former rather than the latter, in which case for a sufficiently large number of articles (I dunno, a thousand or more?), you can speed things up by an order of magnitude or more.

If that's so, then you likely want to separate things into a list of authors, a list of journals, and a list of articles with pointers to authors and journals. A setup like this would let you query articles by author or journal, for instance.

If that appeals to you, then a database may be in order. Popular open source database packages include:

Perl Modules for database access start with the ubiquitous DBI, ranging all the way up to relational DB backed object frameworks like SPOPS and Alzabo. My personal favorite for intuitive ease of use is Class::DBI, which maps object classes to database tables on a one-to-one basis. It's not perfect (yet) but I find myself more productive using it.

OTOH, you may want to keep things all in a single file as you do now, but speed up your searches, in which case you may want to use something like Berkeley DB.


In reply to Re: Basic data storage question by cleverett
in thread Basic data storage question by dannoura

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.