in reply to Infinite loop prevention for spider

As an experiment, for a while I had a link at http://www.stonehenge.com that consisted solely of -/, and I put a symlink in the web directory linking "-" to ".". That means that you could address any page on my site with an arbitrary number of "/-" throwaways, such as "/-/-/-/-/merlyn/columns.html".

I did this to see what kind of similar-duplicate rejection algorithms the big indexing spiders use. Most of them recognized rather quickly that the pages were duplicate pages, but NorthernLights had indexed about 20 levels deep of the same pages before I turned the link off. Bleh!

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

  • Comment on •Re: Infinite loop prevention for spider