in reply to Re^4: Parallel Search using Thread::Pool
in thread Parallel Search using Thread::Pool

2. How to display the results as they are available from any source? application must not wait for all.

3. How to merge all results, when I am asking for Incremental display.

You need to employ the services of a seriously experienced web architect. Serving HTML incrementally requires detailed knowledge of both the webserver and the browsers you are seeking to target. I have neither.

Fetching the data from 50 sources concurrently and merging the required results back together is relatively trivial. One thread per source and a common queue. Results are posted to the queue by the threads and the cgi thread reads it off, formats it and serves it.

The difficult part is the web handling. HTTP is a request-response protocol. The browser sends a request; the server sends a response; the browser displays it. And the server won't send anything else until the browser sends another request. So to display results incrementally, you have to arrange for the browser to re-request to get updates.

That can be done with meta tags, javascript or by having the user hit refresh, but then the new response will overwrite the browser's display obliterating what was sent the first time. So for the user to see the results build up incrementally, you have re-send any results you sent the last time plus any additions. But that means that the server has to remember what it sent--and to whom. But as HTTP is connectionless, that means having a means to identify each user and persistent storage to record what to send to whom. And how you go about doing that will depend upon what web server you use; what session mechanism you use; what persistent storage you have; what web-app software/framework/development tools you use. etc. etc.

There's also the problem of how your webserver is going to handle running 500 concurrent Perl threads? From my very limited understanding of Apache, it doesn't like (Perl) threads much. Less so if you are also using mod_Perl or FastCGI.

If I were trying to do this, I would have the webserver hand-off the query to a dedicated Perl process. Something like this:

  1. Webserver receives a request for the query form and serves it.
  2. When it receives the completed form, it validates the query and if it is good, it spawns a separate instance of Perl. Passing the query parameters and retrieving a port number that the new Perl instance will listen on.
  3. It then send the browser a redirect to that port number.

    The browser is now talking directly to a Perl instance dealing with its particular query.

    That Perl instance starts the 50 threads and issues the requests.

  4. When the Perl process receives the redirected request on the port it opened, it formats any results it has received so far and sends the HTML with a meta refresh tag.
  5. Each time the refresh request is received, it adds any new results, to those accumulated last time, and re-sends the response.
  6. Repeat till done.
  7. Redirect the user back to the web server.
  8. Terminate the Perl process.

This way, as each query is being serviced by a dedicated Perl instance, there is no possibility of mixing up the users and no need for persistent storage that would need to be cleaned up. When the user quits or the session times out; the process terminates and everything is cleaned up automatically.

But I'm not a web guy, so take that with a huge pinch of salt and pay someone for their advice and knowledge. Choose them carefully.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."
  • Comment on Re^5: Parallel Search using Thread::Pool

Replies are listed 'Best First'.
Re^6: Parallel Search using Thread::Pool
by shanu_040 (Sexton) on Jun 03, 2009 at 04:12 UTC
    Hi,
    Instead of instantiating a new perl instance of perl on other port can I use SOAP::Lite?
    Actually I am using SOAP::Lite to create a Parallel Server so that any number of threads or process(I am not sure what am I going to use?) can be created on different web server at different Physical Location.()
    Now If you look at it, there are two perl instances running<bt>1. Main Perl Instance i.e. (main.pl)
    2. Which runs on separate server using SOAP::Lite(soap_para_search.pl).
    In main.pl, I am creating session, cookies, search object for each sources to search etc. Now I can pass search objects and query using SOAP::Lite to soap_para_search.pl.
    Now my queries are:(for threads)
    1. Can I use same server(main.pl) as Parallel Server? yes, I am using mod_perl.
    2. What will be the sample code? I am also using TT2 for display.
    Thanks
      I am using SOAP::Lite ... I am also using Template::Toolkit v2

      Your using those, but the threads--which you haven't yet decided whether to use or not--is responsible for the slowness of your app. Yeah right!

      1. Can I use same server(main.pl) as Parallel Server?

        No.

      2. What will be the sample code?

        Whatever sample code you chose to cut & paste.

      Your problem has nothing to do with threads (or processes, or the decision between the two) and everything to do with writing a web application which you apparently do not understand the first principles of.

      I cannot help you further.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Hi,
        Could you please tell how can I open up the connection(or request) between browser and Server till last set of recodes come.
        Thanks