It would have to be a pretty stupid crawler to do that. Most of them don't follow links on the same site indefinitely. I know that the Googlebot eventually “gets bored” and I imagine all the major search engines follow the same principles.

This isn't simple courtesy — the bot would be easy to trap otherwise. You could keep it treading water on a site indefinitely by leading it onto a script which generates self-links that aren't obviously such.

I'm not really advocating anything in particular; I think thepen's static archive is fine for the job.

But if we wanted to accomodate bots on the live site, it wouldn't at all be difficult or impose disproportionate traffic. F.ex, the bots could be instructed to only follow links from, but not index the content of section frontpages. On root nodes, they could be given a view with a plain unthreaded list of links to notes associated with the node but without the notes' text. On notes, they'd only see the text of the particular note visited — there wouldn't even be a query against the DB to look for replies. That should keep the load pretty moderate and would improve indexing too (you don't get bogus hits on nodes where the hit appeared in a reply).

I don't know if the effort would be justified, but it is entirely feasible.

Makeshifts last the longest.


In reply to Re^5: Google indexes Perlmonks by Aristotle
in thread Google indexes Perlmonks by bart

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.