The difference is that somebody who's harvesting listings will be getting much more data than a legitimate user. So the problem reduces to keeping track of whether a series of requests have come from the same user or not.
The classic way to track a single user is with a cookie or session ID. One possibility would be to check for the presence of a cookie on the user's computer. If we don't find it, they get a message saying "Hang on while I register your computer with the system," and we just sleep(60) then set the cookie. After that, the cookie tracks how many searches they've done that day, and as the number gets too high, we start to sleep for longer and longer---essentially tarpitting for Web browsing.
That would mean that either the user cooperates with the cookie and we can limit how many listings they can retreive in a day, or else they don't and they can get one listing per minute. With enough listings, the data would be stale before everything was retreived.
The inconvenience wouldn't be too bad, since for normal operation it would be one 60-second wait ever for each computer.
Of course, it would be terrible if cookies were disabled or blocked.
In reply to Re: Re: State-of-the-art in Harvester Blocking
by sgifford
in thread State-of-the-art in Harvester Blocking
by sgifford
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |