When concatenating your files, remove all the newlines so that each file becomes a single line prefixed by the path information. This makes searching for phrases that span lines much simpler and much, much faster.

Excellent point, thanks for that. Vis optimizing on the server, I do not have root access and that may be a hassle, but I will keep this in mind once everything is finalized.

I am sure the regexp is feasible on this scale -- as it is now, most of the work is using regexps to parse the tags out, which must be done, and it still performs usably fast. No matter how I database the data, it should be way, way, way quicker with the tags preprocessed out.

In reply to Re^2: What DB style to use with search engine by halfcountplus
in thread What DB style to use with search engine by halfcountplus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.