cyber-guard has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have a cgi script which, when user connects, starts to make simultaneous connections to different websites searching for some information. Because each thread has the same goal, I just detach it, and the function that is ran by the thread will call exit if the answer is found ending the perl process and all threads.

The problem is that if user disconnects, i.e. leaves the page before the answer is found, the script and all the threads will still be running. This of course creates large overhead, so I was wondering if there is a way to check whether a user is still connected to the page?
thanks

Replies are listed 'Best First'.
Re: CGI and Threads
by zentara (Cardinal) on Jul 03, 2011 at 12:34 UTC
    I'm not a cgi guru, but until one answers you, you will probably need to set up a session, see Creating Sessions. A pure socket connection can detect a client disconnect, but a web server won't do that, as far as I know.

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
      I don't think that's an option, because I need to check the user state periodically, and the only way I can think of doing something similar via sessions is to set session expiry time. And that's not exactly what I am looking for...
Re: CGI and Threads
by zentara (Cardinal) on Jul 03, 2011 at 16:16 UTC
    Most of the questions involving long running cgi processes asked here, get solved by giving the client an id number, telling them to check back in later, like maybe after an email notification is sent. Then you write their data to a database for them to retrieve later. Maybe there is a way for javascript to send a post when the user leaves your page? Like javascript leave page. You could set a cookie or id, and have the javascript send you the id number that left page, and you can close up the processes associated with that id.

    Can you use plain sockets? Sockets would do it easily, ( well there is alot of details) :-). Or are you stuck going thru a web server?


    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: CGI and Threads
by locked_user sundialsvc4 (Abbot) on Jul 03, 2011 at 13:09 UTC

    Most sites that I know of use a JavaScript routine that periodically sends dummy AJAX requests.

    Nevertheless ... your purely thread-based strategy won’t scale up.   All I’d have to do to cause a denial-of-service attack is to see to it that your service became popular.   Instant fork-bomb.

      Funny you should say that, but v0.1 of the script actually did. And that's why I need to ensure that the threads die after user disconnects. As for scaleability, I do limit number of threads per user, and there is a number of slots allocated to the users, so if there isn't free slot, users request is put in a queue and processed later. Both of those are of course variable, so I can change it depending on needs and resources available.
      If you have any alternatives in mind though for parallel cgi processing, I am opened to suggestions.
      I have been looking into AJAX solution, but was wondering if there was any exclusively perl solution.

        Sometimes Perlish solutions are more about stepping back and strategizing than anything else. I don't know much about your site, but is it possible to anticipate what pages will be requested, to hit them and then to cache the results for a period of time such that multiple users could benefit from a single fetch? Even if only one user could benefit, at least that puts you in control of when fetches are being done, rather than doing them real-time on demand.

        How long before a cached result becomes stale is one of the big questions to whether this sort of strategy could work to your benefit, but without knowing more info, it seems like something you could consider.


        Dave

Re: CGI and Threads
by bluescreen (Friar) on Jul 03, 2011 at 15:53 UTC

    Even if you detect the disconnection and you can shutdown the threads - which is very complex - and if you have database or file system actions within the thread's code it might leave junk there ( i.e. some records inserted but not all of them ).

    I don't know what is your app doing but like suggested here I'd recommend Ajax, since it will give you flexibility in terms the maybe not all clients need all information so you can request partial information. Also it will give your page the sense of being fast because you render the page immediately and load results as they arrive without waiting the all threads to generate the payload and render the page

    Another approach if the work the threads do is limited is to have a batch job that fetches the information and caches it in your side, so faster render times and no need for threads.