How do I prevent more than one request (to some URL) per second with CGI?

Balandar has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How do I prevent more than one request (to some URL) per second with CGI? by perlplexer (Hermit) on Nov 27, 2002 at 20:54 UTC
Not sure if I understand what you're trying to accomplish here... Is there a concern that the script can be started by multiple users simultaneously? Or do you simply need a delay between successive requests made by the same process? If latter is the case, you have your answer already - sleep(). If not, and you need logic to gate requests from multiple instances of the same script, you can use a lock file. `use Fcntl ':flock'; my $file = 'file.lck'; open my $lfh, ">>$file" or die "Can't access $file : $!\n"; if (flock $lfh, LOCK_EX){ sleep 1 while (time - (stat $file)[9] < 1); # # Send your Amazon requests here # utime time, time, $file; }else{ print "Can't lock $file: $!"; } close $lfh;` [download] The code is untested and may not work if you simply cut and paste it. It should, however, give you an idea of how to approach the problem. --perlplexer	[reply] [d/l]
Re: Re: How do I prevent more than one request (to some URL) per second with CGI? by Balandar (Acolyte) on Nov 27, 2002 at 21:07 UTC
That's what I was looking for. I didn't even consider locking a file. Thanks for the help perlplexer.	[reply]
Re: Re: Re: How do I prevent more than one request (to some URL) per second with CGI? by rob_au (Abbot) on Nov 27, 2002 at 23:35 UTC
With that in mind, it may be worth your looking at merlyn's Highlander - allow only one invocation at a time of an expensive CGI script which allows only a single simultaneous invocation of a CGI script. If you combine this with a sleep function within your script such that execution time is greater than a second, this will effectively achieve your goal. `perl -e 'print+unpack("N",pack("B32","00000000000000000000000111101100")),"\n"'`	[reply]
Re: How do I prevent more than one request (to some URL) per second with CGI? by tachyon (Chancellor) on Nov 27, 2002 at 20:31 UTC
`use LWP::Simple; use Time::HiRes qw( time sleep ); while (1) { my $time = time; print "Time is $time\n"; # the get will generally take > 1 sec but comment out to demo time + loop # get('http://foo.com'); sleep 0.001 while time < $time + 1; } __DATA__ Time is 1038429003.031 Time is 1038429004.031 Time is 1038429005.031 Time is 1038429006.031` [download] Alternatively without Time::HiRes or any accuracy.... `use LWP::Simple; while (1) { get('http://foo.com'); sleep 1; }` [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l] [select]
Re: How do I prevent more than one request (to some URL) per second with CGI? by Aristotle (Chancellor) on Nov 27, 2002 at 21:17 UTC
You could relatively easily do that by wrapping a HTTP::Daemon around a LWP::RobotUA (using `$robot->delay(1/60);`) to create a proxy that only serves one request per second per host. Be sure to reject any URLs to any other servers than the one you want, and preferrably to have it listen only on localhost. Then just use LWP::UserAgent as usual, with localhost and the appropriate port set as proxy. (You will need a fair amount of proficience with the HTTP protocol to get it all right of course but that should be by far the fastest to get going approach.) Makeshifts last the longest.	[reply]
Re: How do I prevent more than one request (to some URL) per second with CGI? by Balandar (Acolyte) on Nov 27, 2002 at 20:53 UTC
IPC::Shareable and IPC::ShareLite looks like a possibility. An example of what I meant in the first post is as follows. Take two users. Both of whom click on a link that gets XML data from Amazon.com at the exact same time. What happens when it goes to get the data? The script will make two calls to the Amazon servers at the same time which is more than the (1) request per second. This is what I am trying to prevent and why I was going to try the database route. I may be wrong though. Would the system actually do the above and run the same script within one second of each other? Or would there be enough time (1 second) between executions that I do not even need to worry about 2 or more requests occurring at the same time?	[reply]
Re: How do I prevent more than one request (to some URL) per second with CGI? by cLive ;-) (Prior) on Nov 28, 2002 at 17:08 UTC
In that case look at lockfiles - merlyn wrote a good article - "The poor man's load balancer" in the very late Web Techniques, which he has very kindly kept online here. .02 cLive ;-)	[reply]
Re: How do I prevent more than one request (to some URL) per second with CGI? by ehdonhon (Curate) on Nov 27, 2002 at 21:08 UTC
Your main problem is going to be with race conditions. What if two different processes are started at the exact same time? Simply telling each one to sleep, as others have suggested, will not help. They will both sleep the same amount of time, then both execute at the same time. You need a way to implement some sort of mutual exclusion lock (or semaphore). Each process would attempt to get the lock. Once a process obtains the lock, it would be the responsibility of that process to make sure it held the lock for at least one second. Each process is only allowed to execute requests while it is holding the lock.	[reply]
Re: How do I prevent more than one request (to some URL) per second with CGI? by belg4mit (Prior) on Nov 27, 2002 at 20:41 UTC
If you ran on a Mac as a synchronous CGI, and your script took a second to run you'd be home by now. Using IPC::Shareable or IPC::ShareLite might also be cleaner. `-- I'm not belgian but I play one on TV.`	[reply]
Re: How do I prevent more than one request (to some URL) per second with CGI? by talexb (Chancellor) on Nov 27, 2002 at 20:32 UTC
At the risk of stating the obvious, how about `sleep 1` each time through the loop? --t. alex but my friends call me T.	[reply] [d/l]
Re: How do I prevent more than one request (to some URL) per second with CGI? by ibanix (Hermit) on Nov 28, 2002 at 05:57 UTC
"No more than 1 request/second" can be sorta tricky. No more than 1 GET request? Do images on a page count? By the way, if you write your application to send back a redirect URL (HTTP code 302, I believe?), you can "bypass" the limit -- because the request is coming from the users' browser. It sounds like you're grabbing some info and processing before you spit it back out to the user. Have you thought about caching data so you don't have to do a request every time? <-> In general, we find that those who disparage a given operating system, language, or philosophy have never had to use it in practice. <->	[reply]
Re^2: How do I prevent more than one request (to some URL) per second with CGI? by Aristotle (Chancellor) on Nov 28, 2002 at 14:06 UTC
Images are unlikely to be of interest when fetching some data to a script. A redirect is obviously useless since it doesn't get the data to the script. The cache must be synchronized not to update faster than once per second also, so whether you're using a cache or not does not automatically ensure strict compliance to the demand under any circumstance. If you already have some sort of serializing mechanism though, adding caching to it is a an excellent proposal. Makeshifts last the longest.	[reply]