in reply to perl regex or module that identifies bots/crawlers

Google and Yahoo should certainly be honoring your robots.txt file. You might want to take a closer look, to see what IP address these requests are coming from and what URLs they are fetching; perhaps there is another path to your cgi-bin directory that isn't being protected by your robots file, or maybe there is an error that's preventing your robots file from being processed correctly.

Replies are listed 'Best First'.
Re^2: perl regex or module that identifies bots/crawlers
by Anno (Deacon) on Mar 20, 2007 at 22:22 UTC
    I agree that the real Google and Yahoo, and other big ones, will certainly honor robots.txt. If bots under their names invade a server that may only indicate that these are popular fake names for rogue bots. It would make sense to look like a legit bot instead of, for instance, a browser.

    That said, it is certainly a good idea to check if robots.txt is working as it should.

    Anno