Balandar has asked for the wisdom of the Perl Monks concerning the following question:

I am working (well in the planning stages really) on a script that interacts with the Amazon.com Web Services. No, this is not for the contest. I do not have the time nor the experience in Perl to even consider such an undertaking. In the documentation it states that, “Your Web site or application should make no more than one (1) request per second to our servers”. I am not sure how to go about this in Perl.

What I had in mind is storing the current time in seconds (epoch or something) in a MySQL database after every request has been made to Amazon.com. Then when a user would execute the script, it would then access that value every time a request is made to Amazon.com. I would then compare this database value with the current time and if they are the same I would sleep(1000) and then do it again. If they are not I would then make the request to Amazon.com. This seems to be a bit sloppy to me, there has to be a more efficient way to do the above. All of these database accesses seem a bit much. I was also thinking about creating a daemon but it seems that my hosting provider charges an extra fee for running daemons.

Any suggestions would be helpful. I am still new to Perl, so please bear with me.

Thanks

  • Comment on How do I prevent more than one request (to some URL) per second with CGI?

Replies are listed 'Best First'.
Re: How do I prevent more than one request (to some URL) per second with CGI?
by perlplexer (Hermit) on Nov 27, 2002 at 20:54 UTC
    Not sure if I understand what you're trying to accomplish here... Is there a concern that the script can be started by multiple users simultaneously? Or do you simply need a delay between successive requests made by the same process?
    If latter is the case, you have your answer already - sleep().

    If not, and you need logic to gate requests from multiple instances of the same script, you can use a lock file.
    use Fcntl ':flock'; my $file = 'file.lck'; open my $lfh, ">>$file" or die "Can't access $file : $!\n"; if (flock $lfh, LOCK_EX){ sleep 1 while (time - (stat $file)[9] < 1); # # Send your Amazon requests here # utime time, time, $file; }else{ print "Can't lock $file: $!"; } close $lfh;
    The code is untested and may not work if you simply cut and paste it. It should, however, give you an idea of how to approach the problem.

    --perlplexer
      That's what I was looking for. I didn't even consider locking a file. Thanks for the help perlplexer.
        With that in mind, it may be worth your looking at merlyn's Highlander - allow only one invocation at a time of an expensive CGI script which allows only a single simultaneous invocation of a CGI script. If you combine this with a sleep function within your script such that execution time is greater than a second, this will effectively achieve your goal.

         

        perl -e 'print+unpack("N",pack("B32","00000000000000000000000111101100")),"\n"'

Re: How do I prevent more than one request (to some URL) per second with CGI?
by tachyon (Chancellor) on Nov 27, 2002 at 20:31 UTC
    use LWP::Simple; use Time::HiRes qw( time sleep ); while (1) { my $time = time; print "Time is $time\n"; # the get will generally take > 1 sec but comment out to demo time + loop # get('http://foo.com'); sleep 0.001 while time < $time + 1; } __DATA__ Time is 1038429003.031 Time is 1038429004.031 Time is 1038429005.031 Time is 1038429006.031

    Alternatively without Time::HiRes or any accuracy....

    use LWP::Simple; while (1) { get('http://foo.com'); sleep 1; }

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: How do I prevent more than one request (to some URL) per second with CGI?
by Aristotle (Chancellor) on Nov 27, 2002 at 21:17 UTC
    You could relatively easily do that by wrapping a HTTP::Daemon around a LWP::RobotUA (using $robot->delay(1/60);) to create a proxy that only serves one request per second per host. Be sure to reject any URLs to any other servers than the one you want, and preferrably to have it listen only on localhost. Then just use LWP::UserAgent as usual, with localhost and the appropriate port set as proxy. (You will need a fair amount of proficience with the HTTP protocol to get it all right of course but that should be by far the fastest to get going approach.)

    Makeshifts last the longest.

Re: How do I prevent more than one request (to some URL) per second with CGI?
by Balandar (Acolyte) on Nov 27, 2002 at 20:53 UTC

    IPC::Shareable and IPC::ShareLite looks like a possibility.

    An example of what I meant in the first post is as follows. Take two users. Both of whom click on a link that gets XML data from Amazon.com at the exact same time. What happens when it goes to get the data? The script will make two calls to the Amazon servers at the same time which is more than the (1) request per second. This is what I am trying to prevent and why I was going to try the database route. I may be wrong though. Would the system actually do the above and run the same script within one second of each other? Or would there be enough time (1 second) between executions that I do not even need to worry about 2 or more requests occurring at the same time?

      In that case look at lockfiles - merlyn wrote a good article - "The poor man's load balancer" in the very late Web Techniques, which he has very kindly kept online here.

      .02

      cLive ;-)

Re: How do I prevent more than one request (to some URL) per second with CGI?
by ehdonhon (Curate) on Nov 27, 2002 at 21:08 UTC

    Your main problem is going to be with race conditions. What if two different processes are started at the exact same time? Simply telling each one to sleep, as others have suggested, will not help. They will both sleep the same amount of time, then both execute at the same time.

    You need a way to implement some sort of mutual exclusion lock (or semaphore). Each process would attempt to get the lock. Once a process obtains the lock, it would be the responsibility of that process to make sure it held the lock for at least one second. Each process is only allowed to execute requests while it is holding the lock.

Re: How do I prevent more than one request (to some URL) per second with CGI?
by belg4mit (Prior) on Nov 27, 2002 at 20:41 UTC
    If you ran on a Mac as a synchronous CGI, and your script took a second to run you'd be home by now.

    Using IPC::Shareable or IPC::ShareLite might also be cleaner.

    --
    I'm not belgian but I play one on TV.

Re: How do I prevent more than one request (to some URL) per second with CGI?
by talexb (Chancellor) on Nov 27, 2002 at 20:32 UTC

    At the risk of stating the obvious, how about sleep 1 each time through the loop?

    --t. alex
    but my friends call me T.
Re: How do I prevent more than one request (to some URL) per second with CGI?
by ibanix (Hermit) on Nov 28, 2002 at 05:57 UTC
    "No more than 1 request/second" can be sorta tricky.

    No more than 1 GET request? Do images on a page count?

    By the way, if you write your application to send back a redirect URL (HTTP code 302, I believe?), you can "bypass" the limit -- because the request is coming from the users' browser.

    It sounds like you're grabbing some info and processing before you spit it back out to the user. Have you thought about caching data so you don't have to do a request every time?

    <-> In general, we find that those who disparage a given operating system, language, or philosophy have never had to use it in practice. <->
      Images are unlikely to be of interest when fetching some data to a script. A redirect is obviously useless since it doesn't get the data to the script. The cache must be synchronized not to update faster than once per second also, so whether you're using a cache or not does not automatically ensure strict compliance to the demand under any circumstance. If you already have some sort of serializing mechanism though, adding caching to it is a an excellent proposal.

      Makeshifts last the longest.