in reply to Re: Automating a search using perl
in thread Automating a search using perl

THanks, I was actually just using that as an example - but just as a matter of interest, how would google know if someone was automating searches to its engine - does it check for the I.P.s that visit google the most.

Replies are listed 'Best First'.
Re: Re: Re: Automating a search using perl
by Ovid (Cardinal) on Apr 29, 2002 at 20:07 UTC

    Checking IPs is an iffy solution at best. Since HTTP is essentially stateless, they can't prove that anyone individual is necessarily guilty of anything. However, they recently requested that a module be removed from the CPAN for violating their terms of service. I think that's fair. I think as a matter of being a good netizen, it's appropriate to respect their restrictions.

    Here's the relevant section from their terms of service (which is why you should use their SOAP interface - it's really easy).


    No Automated Querying

    You may not send automated queries of any sort to Google's system without express permission in advance from Google. Note that "sending automated queries" includes, among other things:

    • using any software which sends queries to Google to determine how a website or webpage "ranks" on Google for various queries;
    • "meta-searching" Google; and
    • performing "offline" searches on Google.

    Please do not write to Google to request permission to "meta-search" Google for a research project, as such requests will not be granted.


    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Re: Re: Automating a search using perl
by hossman (Prior) on Apr 29, 2002 at 23:00 UTC
    There are lots of things sites can do to determine if they want to block traffic, IPs are only one example (and something that's not common unless the admins believe that IP has deliberately attempt to DOS them, or in some other way jepordize their site. (ie: if you try to crawl http://shopping.yahoo.com/ to get all of their product data to build your own shopping portal, they will probably block your IP, wether you are doing it in a very low intensity way or not).

    More generally, sites can analyze the "signature" of requests to identify if they want to block you or not. By signature i mean anything that can make your requests stand out from those of the other 99% of their traffic. They might do it based on your User-Agent, or some other HTTP header that is unique to the API you are using, or they might do it based on some combination of things that help identify people who are being decieptful (if your User-Agent says you're Netscape 6, but you use "HTTP/1.0", that's a dead give away ... other more subtle things might be descrepencies in what HTTP headers you send vs. the headers that Netscape 6 ALLWAYS sends.

    Bottom line: play nice. If you get blocked, you probably deserved it.