in reply to Speeding up HTML parsing
Another thing that might help is abstracting the code from each engine-specific section into a subroutine. This way you can debug that code, compartmentalize it, and have a more readable flow of execution in your main program. Also, if you ever want to reuse the code that you wrote to get and parse the website results, moving that code into subroutines is the first step.
Here's some example code for you, it abstracts the Altavista search method into it's own subroutine and also uses Parallel::ForkManager to speed things up:
#!/usr/bin/perl use LWP::Simple; use Parallel::ForkManager; use strict; $|=1; my @urls = qw( http://sulfericacid.perlmonk.org http://sulfericacid.com ); my $number_of_forks = scalar @urls; my $forkmanager = Parallel::ForkManager->new( $number_of_forks ); foreach my $site ( @urls ) { $forkmanager->start and next; my $altavista_results = &altavista_search( $site ); print "Searched http://www.altavista.com for site $site\n"; print "results: $altavista_results\n"; $forkmanager->finish; } $forkmanager->wait_all_children; ####################### # Altavista! ####################### sub altavista_search { my $url = shift; my $engine_link = "http://www.altavista.com/web/results?q=link:$url&kl=XX&search=S +earch"; my $content = get("$engine_link"); my @lines = split /\n/, $content; my $results; foreach my $line (@lines) { $results = $1 if $line =~ m/AltaVista found (.*) results/; } return $results; }
|
|---|