There are lots of things sites can do to determine if they want to block traffic, IPs are only one example (and something that's not common unless the admins believe that IP has deliberately attempt to DOS them, or in some other way jepordize their site. (ie: if you try to crawl http://shopping.yahoo.com/ to get all of their product data to build your own shopping portal, they will probably block your IP, wether you are doing it in a very low intensity way or not).

More generally, sites can analyze the "signature" of requests to identify if they want to block you or not. By signature i mean anything that can make your requests stand out from those of the other 99% of their traffic. They might do it based on your User-Agent, or some other HTTP header that is unique to the API you are using, or they might do it based on some combination of things that help identify people who are being decieptful (if your User-Agent says you're Netscape 6, but you use "HTTP/1.0", that's a dead give away ... other more subtle things might be descrepencies in what HTTP headers you send vs. the headers that Netscape 6 ALLWAYS sends.

Bottom line: play nice. If you get blocked, you probably deserved it.


In reply to Re: Re: Re: Automating a search using perl by hossman
in thread Automating a search using perl by Baz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.