The difference is that somebody who's harvesting listings will be getting much more data than a legitimate user. So the problem reduces to keeping track of whether a series of requests have come from the same user or not.

The classic way to track a single user is with a cookie or session ID. One possibility would be to check for the presence of a cookie on the user's computer. If we don't find it, they get a message saying "Hang on while I register your computer with the system," and we just sleep(60) then set the cookie. After that, the cookie tracks how many searches they've done that day, and as the number gets too high, we start to sleep for longer and longer---essentially tarpitting for Web browsing.

That would mean that either the user cooperates with the cookie and we can limit how many listings they can retreive in a day, or else they don't and they can get one listing per minute. With enough listings, the data would be stale before everything was retreived.

The inconvenience wouldn't be too bad, since for normal operation it would be one 60-second wait ever for each computer.

Of course, it would be terrible if cookies were disabled or blocked.


In reply to Re: Re: State-of-the-art in Harvester Blocking by sgifford
in thread State-of-the-art in Harvester Blocking by sgifford

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.