mickie2000 has asked for the wisdom of the Perl Monks concerning the following question:

I "run" a web site that has a sizable online database (in excess of 1 million listings) and after reviewing my log files recently I have found that a few folks are taking advantage of my kindness - I have some insurance companies that are opening my directory in the morning and are cold calling everybody. I want to enforce the following security actions ...

-- Time-Out on my search pages (http://www.contractorresource.com/cgi-bin/search.cgi) to 20 minutes so people can't keep directory open all day

-- Also want to limit each user/IP to 500 database results in 1 day or 1500 in 1 week - if this gets exceeded I want to "blacklist" the user and forbid them access to my db.

If anyone can help me I would REALLY appreciate it. I know just enough PERL to be dangerous and I don't want to start this project with my head in the clouds.

Replies are listed 'Best First'.
Re: Database Security
by thraxil (Prior) on Apr 02, 2002 at 17:07 UTC

    how about just not making the info public?

    there really isn't any other reliable way to do it. whatever method you use to prevent people from spending more than 20 minutes on the search results page can be easily circumvented by the user just hitting Save in their browser and storing a local copy. IP limits are largely ineffective (IP's are not that hard to forge or just change every couple minutes if they have control over their local network). i once consulted on what basically was a large poll site and one of my tasks was to develop strategies to block people from voting multiple times. the straightforward approach of tracking IP addresses wouldn't cut it in the real world; what was eventually needed was some rough AI that would detect and flag input patterns that look suspicious (lots of the same vote coming in rapid succession or from similar/sequential IP addresses, etc). if they're determined, there's very little you can do without expending a huge effort.

    is this data that the people listed added to your site or was it public data that you collected? have the people listed in the database consented to having their info listed?

    what is your ultimate goal in blocking these people? is it to protect the people in the database from being called by insurance companies or are you somehow losing customers/revenue because they're only loading the pages once?

    anders pearson

Re: Database Security
by Ryszard (Priest) on Apr 02, 2002 at 21:45 UTC
    merlyn did a column once on anti robot stuffing of vote polls. The idea was to prevent automated techniques from making multiple votes for a poll. The technique used was generating a code on an image and getting the user to enter the code along with their vote.

    Why not apply this technique to your website? for each search you could generate a "nondeterminate" number that a user has to enter each time they do a search.

    The code would last only a few mins and would not be reusable in the short term.

    In HTTP there is no state, so a user cant keep your directory "open" in a literal sense. If what you mean is a search page is being refreshed all the time, that's easy: Embed a hidden (nondeterminate) value in your page and hook it up with a created time in a database, when the user hits refresh, compare the hidden token with the stored timestamp, and return a timeout page if the difference is over your threshold.

    A Non Determinate Value could mean:

    use Digest::MD5 qw(md5_hex); my $ndv = md5_hex('53cr3t 57r1n6'.$$.rand().localtime() ); my $smallndv = substr($ndv, 0, 4);
      merlyn did a column once on anti robot stuffing of vote polls. The idea was to prevent automated techniques from making multiple votes for a poll. The technique used was generating a code on an image and getting the user to enter the code along with their vote.

      See jcwren's A little fun with merlyn for a way to bypass such a script :-).

Re: Database Security
by Clownburner (Monk) on Apr 02, 2002 at 19:49 UTC

    On the first point, this *can* be done with a little work, although it's not perfect. Use a little server-side code to create a 1-time-code for use in the search script. Store these codes in a database along with a timestamp of when they were created, and if the code is older than 30 minutes, fail the search. It won't stop them from reloading a new page, but would keep them from using the same page over and over again. You could also do the same thing with cookies, which would be a tiny bit harder for the user to work around.

    Perhaps a better solution is to take the database 'private' and require registration before use - you could then track who did what, and manually blacklist the abusers.

    None of that is impregnable, but the goal of any security is simply to make it too much trouble for the would-be attacker compared with the value of the data.

    Sales people are persistant and have a lot of time on their hands, but are not usually very technical. Combine HTTP basic authentication with a registration process and a cookie to track # of searches, and you'd probably block 90% of them.


    "Non sequitur. Your facts are un-coordinated." - Nomad
Re: Database Security
by bastard (Hermit) on Apr 02, 2002 at 19:16 UTC
    a few tips, this won't solve it all though. in the code make sure the user can't specify more than 30 for the max results per page. if ($x < 30) { $x = 30; } do the same for the radius. i'd probably cut that down to 30 miles as well. the trick is to make it harder for these people to get at the information. this would be easy to implement, but will only slow them down.

    you may want to require a user account be setup to view more than 10 results/page and 10 mile radius. make the account openings take a day, place caps, or generate reports of suspicious activity. disable account based on abuses of the caps or your judgement based on the reports. make sure it takes a day to re-activate the account once the user contacts you to get it restored. etc...

    as long as you offer the information, someone can build a system to harvest it. the best you can do is make it impractical so they look somewhere else.

Re: Database Security
by Desdinova (Friar) on Apr 02, 2002 at 23:28 UTC
    I looked at the site and I see why this is tough. A somewhat different approach could be to have to person enter an email address adn email the results. Then you block certain domains ie the insurance company, from getting email. Not perfect but perhaps less intrusive thatn other methods

    Athoer option is the courts, state an offical policy on the use of the data and get a judge to make a ruling about thier misuse of the info.
Re: Database Security
by peterg22 (Novice) on Apr 03, 2002 at 13:44 UTC
    On a purely usability point of view, I would suggest not displaying the phone number directly in the page. By all means display name and location details, but if the vultures are just collecting phone numbers, then adding a link that will send the full details via email may help. I guess that the type of person that might use this service would not object to having to receive an email (or how about via SMS thru an email gateway), but anyone looking to capture dozens of contacts will get fed up with having to do this for every contact ? Just my 2cents worth..

    Mildew Hall.. Home of PurePostPro and other Perl goodies!
    Not only oysters create Pe[a]rls

Re: Database Security
by one4k4 (Hermit) on Apr 03, 2002 at 13:06 UTC
    How about having people create login accounts? They can agree to a terms of service, and you can make sure they're legit by sending a password to the e-mail address they supply? This way, since they're agreeing to the terms of service, you can implement some other features discussed above, and snag the guys who are violating them.

    _14k4 - perlmonks@poorheart.com (www.poorheart.com)
Re: Database Security
by Anonymous Monk on Apr 03, 2002 at 17:21 UTC
    Just so you know web database "skimming" just came up in the supreme court and was deemed illegal. And you can copyright your databases. Please consult council on this. I also want to mention that no queries or other methods will keep people out of your data if you put it on the web. If someone is determined they will get at it, even if they have to result to using M$ Office and VB objects ( yes it is possible ) and brute forcing every possible ID Key possibility. ( And thats why you have a chron called script look up the top site visitors. ) Patrick