in reply to Re: Infinite loop prevention for spider
in thread Infinite loop prevention for spider

Yes, I thought of the possible non-link time stamp issue. My current bot deletes all the URLs for various comparisons, but that might not be enough. I wonder what the typical way of dealing with this is.

There is a new O'Reilly book out called Spidering Hacks. I hope I could find it in a book store near me (I'm not certain enough it would be helpful to shell out the money, sight unseen). And I hope people put the proper entries in their robots.txt files!

Thanks

  • Comment on Re: Re: Infinite loop prevention for spider