When you say "database", do you mean a SQL database?

If so then you need to do some additional work to index the database (unless you happen to have a full text search engine, of course).

For a very simple system you can do something like this:

create table words ( doc_id int -- or numeric? , word varchar(32) )
For each document, extract the words (skipping things like "I", "the", etc.), and store each (doc_id. word) combination in the words table.

Now you can search your documents with something like this:

select d.doc_text, d.doc_id from documents d , words w where w.doc_id = d.doc_id and w.word in (<list of words to match>)
Now you have all the documents that have each of the words that you want to search on - you can then apply additional logic to find phrases (i.e. "receeding hairline") for example.

To help rank documents you can expand this by adding either a count column and/or a position to the words table. The position column is the offset of this word in the document, and you can use this to handle "near" queries, and also to rank documents.

Michael


In reply to Re: Search a database for every permutation of a list of words by mpeppler
in thread Search a database for every permutation of a list of words by jfrancis

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.