I think limiting the number of displayed results is the most promising path here. I'd do two things: pick a (probably weighted) random subset of results to display, and do this regardless of how the information was requested. This may require some pain threshold on the part of regular customers but shouldn't actually hinder regular business, while being a huge pain in the bottocks for a spider. Also, somebody refreshing the same results page more than thrice or so to get more of the results would give themselves away as a spider.

Another idea would require to paginate your results, and look up the search parameters in a table with unique IDs. These would be used in your "next page" / "last page" links instead of having the parameters right inside the link. You would then put a random number of invisible links (modulo text mode browsers unfortunately) with invalid search IDs around the "next page" / "last page" links. Someone who keeps stumbling into these blind links is obviously a spider.

The second idea isn't completely airtight, but very difficult to circumvent anyway. If you combine them, the people using the spiders will have to pay someone a lot of money to write a spider that can harvest your site without getting caught by the traps. Even then you could still check logs manually.

Makeshifts last the longest.


In reply to Re: State-of-the-art in Harvester Blocking by Aristotle
in thread State-of-the-art in Harvester Blocking by sgifford

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.