I've got a Perl script which manages a medium-sized database of real estate listings for a local realtors' association. It's accessible to the public, but lately we've seen in our logs a few people going through the whole database sequentially. Presumably these people are harvesting the database.
The realtors' association would like to stop this. What are some of the better techniques for doing this, with minimal annoyance to real customers, and without making the site inaccessible to people with text-only browsers or visual impairments?
I've had some ideas already, but I'm hoping for something better...
- Limit the "show me more" button to only show a limited number. This can be subverted by doing search ranges (show me houses that cost 0-1000 dollars; now 1001-2000; now 2001-3000; etc.).
- Monitoring logs and blocking IP addresses. This doesn't work well if the harvester has a dynamic IP address, and is time-consuming besides.
- Limiting the number of requests from one IP address in a day. This is annoying for large offices behind a NAT, or AOL users who share the same proxy. It also doesn't really prevent anything; just makes them spread the harvest over several days.
- Taking legal action. This is expensive.
- Using the "make them transcribe an image" trick. This won't work for the visually impaired, or people with text-only browsers.
- Try to obscure the contents of the pages, with clever CSS and so forth. This could make the site inaccessible, and is bound to lead to an arms race, with the harvester figuring out the obfuscation, me creating a more complex one, and so forth.
I'm really just fishing for ideas, so if anybody has any thoughts I'd love to hear them. Thanks!
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.