15MB is not a lot of data, <1 sec. response seems possible (as another poster notes) if it uses preprepared indices. Possibly even without an indice, the pure C searching within a database may be in that range. You are losing time with I/O; I'd be surprised a regex-based search on data that is already in memory even takes as long as you say.

Anyway it is true that dbs have limited full text search functionality, what you are asking for is a LIKE (or wildcard) search, plus maybe boolean operators. It will be a lot easier to use a db, really.

On the other hand, I've searched 1GB of data without a relational database in 0.1 seconds (using C++ based htdig behind a mod_perl wrapper). I've searched 10 megabytes of data with a single index and regex in about 1-2 seconds too and that was on a 133MHz P2 IIRC.

Typically these speeds are achieved without an rdbms by precompiling inverted indices (hashes) on the columns (keys) in which you are interested. For wildcard searches I have seen a technique that builds a hash including all substrings of every word. In reality though the maintenance of these inverted indices is a pain (they have to be rebuilt periodically, and often you end up trying to tweak mysterious parameters to improve performance.. also sometimes no wildcard support).

So I'd also recommend a database, if you can get one, but if not then yes for the scale you are talking about you ought to be able to get far better performance than now with the use of precompiled (and periodically updated) indices, maybe just using standard perl data storage modules. But don't step through your current text files a line at a time, that is the job your index generator will do every night.


In reply to Re: Large Constant Database in Text File by mattr
in thread Large Constant Database in Text File by stephentyler

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.