It would have to be a pretty stupid crawler to do that. Most of them don't follow links on the same site indefinitely. I know that the Googlebot eventually “gets bored” and I imagine all the major search engines follow the same principles.
This isn't simple courtesy — the bot would be easy to trap otherwise. You could keep it treading water on a site indefinitely by leading it onto a script which generates self-links that aren't obviously such.
I'm not really advocating anything in particular; I think thepen's static archive is fine for the job.
But if we wanted to accomodate bots on the live site, it wouldn't at all be difficult or impose disproportionate traffic. F.ex, the bots could be instructed to only follow links from, but not index the content of section frontpages. On root nodes, they could be given a view with a plain unthreaded list of links to notes associated with the node but without the notes' text. On notes, they'd only see the text of the particular note visited — there wouldn't even be a query against the DB to look for replies. That should keep the load pretty moderate and would improve indexing too (you don't get bogus hits on nodes where the hit appeared in a reply).
I don't know if the effort would be justified, but it is entirely feasible.
Makeshifts last the longest.
In reply to Re^5: Google indexes Perlmonks
by Aristotle
in thread Google indexes Perlmonks
by bart
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |