Re: Parallel Search using Thread::Pool

Replies are listed 'Best First'.
Re^2: Parallel Search using Thread::Pool by shanu_040 (Sexton) on Mar 17, 2009 at 11:22 UTC
Thanks, my application is web based. I have cerated different perl module(.pm) for each search source(site). Currently I am using Parallel::ForkManger. But the problem I am facing is, it waits for all the children to finish their task and then only I can display the result. I stuck on how to display the results as results get retrieved by a child and subsequently adding other children results for display. What will be the algorithm for the problem? thanks Shanu	[reply]
Re^3: Parallel Search using Thread::Pool by Anonymous Monk on Mar 17, 2009 at 12:00 UTC
Watching long processes through CGI (Aug 02)	[reply]
Re^3: Parallel Search using Thread::Pool by shanu_040 (Sexton) on May 22, 2009 at 18:42 UTC
Hi monks, I am still waiting to get some kind of solution from your side.	[reply]
Re^2: Parallel Search using Thread::Pool by shanu_040 (Sexton) on Jun 01, 2009 at 09:34 UTC
Could you please help me to develop this, I have tried the Thread::Queue. It takes too much time to retrieved the result from a source and I have to search more than 50 sources at a time. There could be more than 10 instances that would be running concurrently. How can main thread can read them off and display> following is the code I am using sub run_search { my ($self, $searches, $search_string, $site, $max_hits, $from_year +, $to_year) = @_; my $Qwork = new Thread::Queue; my $Qresults = new Thread::Queue; my $THREADS = scalar(keys %$searches); my @return; foreach my $obj (values %$searches) { eval { $obj->from_year($from_year); $obj->to_year($to_year); $obj->parse_search(); }; if ($@) { print STDERR "problem2 $@\n"; } $Qwork->enqueue($obj); } $Qwork->enqueue( (undef) x $THREADS ); my @pool = map{ threads->create( \&parallel_search, $Qwork, $Qresults, $max_hi +ts, $self->nuc_code) }1 .. $THREADS; for(1..$THREADS){ while( my $result = $Qresults->dequeue ){ push(@return, $result); } } ## Clean up the threads $_->join for @pool; return(\@return); } # # # the parallel server # # sub parallel_search { my ($Qwork, $Qresults, $max_hits, $nuc_code) = @_; my $tid = threads->tid; my %result; while(my $work = $Qwork->dequeue) { 'require ' . ref($work) . ';'; $work->max_hits($max_hits); $result{$work->resource_id} = $work->get_search_results($work- +>resource_id, $nuc_code); $Qresults->enqueue( \%result ); } $Qresults->enqueue( undef ); } [download]	[reply] [d/l]
Re^3: Parallel Search using Thread::Pool by BrowserUk (Patriarch) on Jun 01, 2009 at 13:41 UTC
It takes too much time to retrieved the result from a source How do you know it is taking too long? How long is too long? How are you measuring it? I'll try to help, but you are going to have to explain what you are doing a lot more clearly that you have to date. Are you trying to display the results on a web page as you get them? If so, that could be the source of your problems. Whilst not impossible, it is quite difficult to render web pages on-the-fly because HTML simply wasn't designed to work that way. If not, then you are going to have to describe or post the overall operation of the application, rather than just keep posting the same basic snippet. What type of application is it? GUI; CLI, web app. What are you searching? DBs, web pages; other? You mention 50 searches and the possibility of 10 concurrent instances. Does each instance search all 50 sources? Are the all searching the same sources? I have looked at your earlier posts but as I do not understand what you are trying to achieve, it's hard to begin to help you. I don't the specific details of the data, but a clear overview of the dataflows is essential. Also, how long is it takling currently, and what is your target? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^4: Parallel Search using Thread::Pool by shanu_040 (Sexton) on Jun 02, 2009 at 04:56 UTC
Hi, I am working on a MeatSearch tool. A metasearch tool is a software application that: • uses multiple protocols • to perform simultaneous searches • across multiple heterogeneous electronic information resources • from a single point of entry. How do metasearch tools work? Metasearch software makes use of the search functionality built into each target resource it is searching. In general terms, a metasearch application goes through a series of steps to search multiple resources simultaneously and return results to the user. Metasearch software: (1) converts the user’s search into a query that can be understood by the built-in search of each of the target resources chosen to be searched.I called it Connectors (2) broadcasts the translated query to the selected target resources. (3) simultaneously retrieves sets of results from all reachable target resources. (4) formats results into a canonical internal format to allow for further manipulation by the metasearch software. (5a) displays the results from each resource in its own ranked or sorted list. OR (5b) displays the results in one merged list, ranked or sorted in some fashion. What type of application is it? It is a Web Application. What are you searching? Multiple heterogeneous electronic information resources i.e. DOAJ, Publisher's Databases. Yes, I can say it searches Web pages You mention 50 searches and the possibility of 10 concurrent instances. Yes, each instance may search for 50 resources. I broadcast the well formated search query to different sources and fetch the from each target sourcesconnector using WWW::Mechanize. To Broadcast the search I am using SOAP::Lite and Parallel::ForkManager. For each target source we have written a code(Connector), Which Does the following creates the WWW::Mechanize object creates search url, and get the search results.(HTML content using WWW::Mechanize->content) Filter the HTML and Other un-wanted information, create a Record Object for each record. Return the reference to the recordSet Object. Now, I need help on the following: 1. Should I use Process or Thread? 2. How to display the results as they are available from any source? application must not wait for all. 3. How to merge all results, when I am asking for Incremental display. 4. First I want to prepare a flow diagram. Can I get the help? Looking forward for your response. Thanks	[reply]
Re^5: Parallel Search using Thread::Pool by BrowserUk (Patriarch) on Jun 02, 2009 at 10:59 UTC
Re^6: Parallel Search using Thread::Pool by shanu_040 (Sexton) on Jun 03, 2009 at 04:12 UTC
Some notes below your chosen depth have not been shown here