ashish.kvarma has asked for the wisdom of the Perl Monks concerning the following question:
For last couple of days I have been stuck to an error and have no clue where I might be doing wrong. The code given below is of a crawler using threads to crawl multiple pages at a time. The code works fine for most of the time but some time it crashes with following error:
Faulting application perl.exe, version 5.10.0.1003, time stamp 0x482a29fd, faulting module SSLeay.dll, version 0.0.0.0, time stamp 0x482a38ff, exception code 0xc0000005, fault offset 0x0001b323, process id 0x135c, application start time 0x01ca3ba8cbd1d852.I am using activestate perl 5.10.0.1003 (have also tried 5.10.0.1005) on Windows Vista (got same results on Windows XP).
Here is a brief decription of code and the code itself.Event listener is a class which loops over the Thread::Queue contents (listen method) and based on the contents inserts data in database and does other processing. Crawler class provides methods to crawl over the web sites (using WWW::Mechanize) and adds data to Queue. Crawler first login to the website and collect various links from where data is to be extracted and then inside a infinite loop crawl over those pages. Crawler returns immediately if there is no data available to extract, if data is found then it stays there and refreshes the page every 30 sec to get data till the data is available.
So far I have only found only two things#!c://perl/bin/perl -w use strict; use warnings; use threads; use threads::shared; use Thread::Queue; use App::Options; use Crawler; use EventListner; use Data::Dumper; my $q = Thread::Queue->new(); my $options = \%App::options; # Get command line options my $page_status = {}; share $page_status; my $page_priorty = {}; share $page_priorty; # Open Listneracks my $event_listner = EventListner->new($q, $page_status, $page_priorty) +; my $listner_thr = threads->create(sub { $event_listner->listen(); }); $listner_thr->detach(); # Create crawler object my $crawler = Crawler->new($q, $options); $crawler->login(); # Login $crawler->fetch_pages(); # Fetch pages my $threads_count :shared = 0; while (1) { # Get all closed pages and sort on there priorty my @closed_pages = grep { $page_status->{$_} eq 'C' } keys %$page_ +status; my @priorty_sort_pages = sort {$page_priorty->{$a} <=> $page_prior +ty->{$b}} @closed_pages; foreach my $page (@priorty_sort_pages) { if ($threads_count < $options->{max_crawlers}) { $threads_count++; $page_status->{$page} = 'O'; $page_priorty->{$page}++; my $thr = threads->create(sub { threads->detach(); $crawler->run($page); $crawler->reset(); unless ($page_status->{$page} eq 'X') { $page_status->{$page} = 'C'; } $threads_count--; }); #if ($thr->is_running()) { sleep($options->{pause}) if $options->{pause}; #} #print "A\n"; } #print "B\n"; } }
I appreciate any help to solve and enhance the script. Thanking you all in advance.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Code using threads crashes intermittently
by ikegami (Patriarch) on Sep 28, 2009 at 03:52 UTC | |
by ashish.kvarma (Monk) on Sep 28, 2009 at 04:26 UTC | |
|
Re: Code using threads crashes intermittently
by diotalevi (Canon) on Sep 29, 2009 at 03:01 UTC | |
|
Re: Code using threads crashes intermittently
by ashish.kvarma (Monk) on Oct 04, 2009 at 05:47 UTC |