in reply to Parallel Search using Thread::Pool

The simplest architecture is to create a Thread::Queue, then have each of your search modules run in separate threads and enqueue their results as they get them. Your main thread can the read them off the other of of that shared queue and display them.

Thread::Pool will not be useful to you as it is meant to run many copies of the same routine concurrently, but your application calls for running different subroutines in each of your threads.

Depending whether your application is web, gui or console based, you might also want to use a second queue or shared scalar to pass new search terms to your threads.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Parallel Search using Thread::Pool
by shanu_040 (Sexton) on Mar 17, 2009 at 11:22 UTC
    Thanks, my application is web based. I have cerated different perl module(.pm) for each search source(site). Currently I am using Parallel::ForkManger. But the problem I am facing is, it waits for all the children to finish their task and then only I can display the result. I stuck on how to display the results as results get retrieved by a child and subsequently adding other children results for display. What will be the algorithm for the problem?
    thanks
    Shanu
      Hi monks,
      I am still waiting to get some kind of solution from your side.
Re^2: Parallel Search using Thread::Pool
by shanu_040 (Sexton) on Jun 01, 2009 at 09:34 UTC
    Could you please help me to develop this, I have tried the Thread::Queue. It takes too much time to retrieved the result from a source and I have to search more than 50 sources at a time. There could be more than 10 instances that would be running concurrently.
    How can main thread can read them off and display>
    following is the code I am using
    sub run_search { my ($self, $searches, $search_string, $site, $max_hits, $from_year +, $to_year) = @_; my $Qwork = new Thread::Queue; my $Qresults = new Thread::Queue; my $THREADS = scalar(keys %$searches); my @return; foreach my $obj (values %$searches) { eval { $obj->from_year($from_year); $obj->to_year($to_year); $obj->parse_search(); }; if ($@) { print STDERR "problem2 $@\n"; } $Qwork->enqueue($obj); } $Qwork->enqueue( (undef) x $THREADS ); my @pool = map{ threads->create( \&parallel_search, $Qwork, $Qresults, $max_hi +ts, $self->nuc_code) }1 .. $THREADS; for(1..$THREADS){ while( my $result = $Qresults->dequeue ){ push(@return, $result); } } ## Clean up the threads $_->join for @pool; return(\@return); } # # # the parallel server # # sub parallel_search { my ($Qwork, $Qresults, $max_hits, $nuc_code) = @_; my $tid = threads->tid; my %result; while(my $work = $Qwork->dequeue) { 'require ' . ref($work) . ';'; $work->max_hits($max_hits); $result{$work->resource_id} = $work->get_search_results($work- +>resource_id, $nuc_code); $Qresults->enqueue( \%result ); } $Qresults->enqueue( undef ); }
      It takes too much time to retrieved the result from a source

      How do you know it is taking too long? How long is too long? How are you measuring it?

      I'll try to help, but you are going to have to explain what you are doing a lot more clearly that you have to date. Are you trying to display the results on a web page as you get them?

      If so, that could be the source of your problems. Whilst not impossible, it is quite difficult to render web pages on-the-fly because HTML simply wasn't designed to work that way.

      If not, then you are going to have to describe or post the overall operation of the application, rather than just keep posting the same basic snippet.

      • What type of application is it?

        GUI; CLI, web app.

      • What are you searching?

        DBs, web pages; other?

      • You mention 50 searches and the possibility of 10 concurrent instances.

        Does each instance search all 50 sources? Are the all searching the same sources?

      I have looked at your earlier posts but as I do not understand what you are trying to achieve, it's hard to begin to help you. I don't the specific details of the data, but a clear overview of the dataflows is essential. Also, how long is it takling currently, and what is your target?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Hi,
        I am working on a MeatSearch tool.
        A metasearch tool is a software application that:
        • uses multiple protocols
        • to perform simultaneous searches
        • across multiple heterogeneous electronic information resources
        • from a single point of entry.

        How do metasearch tools work?

        Metasearch software makes use of the search functionality built into each target resource it is searching. In general terms, a metasearch application goes through a series of steps to search multiple resources simultaneously and return results to the user.
        Metasearch software:

        (1) converts the user’s search into a query that can be understood by the built-in search of each of the target resources chosen to be searched.I called it Connectors

        (2) broadcasts the translated query to the selected target resources.

        (3) simultaneously retrieves sets of results from all reachable target resources.

        (4) formats results into a canonical internal format to allow for further manipulation by the metasearch software.

        (5a) displays the results from each resource in its own ranked or sorted list.

        OR

        (5b) displays the results in one merged list, ranked or sorted in some fashion.

        • What type of application is it?
          It is a Web Application.
        • What are you searching?
          Multiple heterogeneous electronic information resources i.e. DOAJ, Publisher's Databases. Yes, I can say it searches Web pages
        • You mention 50 searches and the possibility of 10 concurrent instances.
          Yes, each instance may search for 50 resources.

        I broadcast the well formated search query to different sources and fetch the from each target sourcesconnector using WWW::Mechanize. To Broadcast the search I am using SOAP::Lite and Parallel::ForkManager.
        For each target source we have written a code(Connector),
        Which Does the following
        • creates the WWW::Mechanize object
        • creates search url, and get the search results.(HTML content using WWW::Mechanize->content)
        • Filter the HTML and Other un-wanted information, create a Record Object for each record.
        • Return the reference to the recordSet Object.

        Now, I need help on the following:
        1. Should I use Process or Thread?
        2. How to display the results as they are available from any source? application must not wait for all.
        3. How to merge all results, when I am asking for Incremental display.
        4. First I want to prepare a flow diagram. Can I get the help?
        Looking forward for your response. Thanks