That's actually possible using existing horisontal scalability of HyperEstraier.

Just setup multiple servers which crawl separate parts of web. Setup search to search over all nodes at once.
Indexer can query search index to find out if some other indexer did crawl that page already (and optionally refresh content if needed). That way, you will have fresher pages with bigger number of incomming links (which you can count and use that also in page ranking - I hope that this idea doesn't violate Google patent).

I don't have pointer to perl solution for this (other than CPAN modules which make every problem 90% done). On the other hand, with current P2P architecture you can have multiple indexes (for e-mail, documents, etc.) and search over just some or all of them.


2share!2flame...

In reply to Re^2: Writing inverted index code in perl might be overkill by dpavlin
in thread Writing a Search Engine in Perl? by techcode

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.