Well, this isn't really a Perl question, but more a question about of to organize your data. How Google does it is a secret, but I can tell you that Google isn't doing a full search against the entire web.

But I can speculate. My guess is that Google has a huge index. An index on words. If it fetches a (new) web page, it makes a list of all the words occurring in the page. For each word, it stores a pointer to said page in the index of words it keeps (including pointer(s) where in the document the word(s) are found). So, if you search for "the brown cat with a glass eye", it will toss out the common words 'the' and 'a' (and perhaps 'with' as well). For 'brown', 'cat', 'glass', and 'eye' (and perhaps 'with'), it gets the pointers to the pages the words are found in. For pages containing all words, you need to take the intersection of the different (sub)results.

Of course, in reality Google will do it much smarter, perhaps not just indexing on single words, but on word pairs or triples, or by using a multilevel index.

But the important point is, if you want to search on words, and you want to search fast, you got to index on words, and not use full text searches. And even if you want to only returns documents that have "the brown cat with a glass eye" right next to each other, it's a huge win if you limit your full text search to those documents that contain the words 'brown', 'cat', 'glass' and 'eye'.

Abigail


In reply to Re: Fulltext DB search: The Need for Speed by Abigail-II
in thread Fulltext DB search: The Need for Speed by jest

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.