in reply to State-of-the-art in Harvester Blocking
No matter what you do implementation-wise, I don't see too many options as to what you can do about this. You have the data available to the public, so how are you going to determine who's "real" and who's a program trying to get all your results?
That would be like perlmonks creating a security layer that only allows a certain computer to access a set number of nodes per hour. Things just don't work like that. As far as going by IP address, I wouldn't even consider that. Too many proxies, dynamic IPs and such going around. Using an IP address to pinpoint a specific user just doesn't work as well as it might have in the past.
The only method I can think of that would be scalable and programatically simple would be to require authentication before being permitted to browse through the listings. Then, you could set limits on number of listings viewable per week or something like that.
But even that is circumventable. The person harvesting your database simply registers multiple usernames (as many as it takes) to do the job (first user grabs first 30 listings, second one grabs next 30, etc etc). So there is no real way that I can think of to 100% effectively solve this "problem". The data is freely available. If your site dishes it out, there's no way to stop it from being programatically stolen.
|
|---|