Hi,

For last couple of days I have been stuck to an error and have no clue where I might be doing wrong. The code given below is of a crawler using threads to crawl multiple pages at a time. The code works fine for most of the time but some time it crashes with following error:

Faulting application perl.exe, version 5.10.0.1003, time stamp 0x482a29fd, faulting module SSLeay.dll, version 0.0.0.0, time stamp 0x482a38ff, exception code 0xc0000005, fault offset 0x0001b323, process id 0x135c, application start time 0x01ca3ba8cbd1d852.

I am using activestate perl 5.10.0.1003 (have also tried 5.10.0.1005) on Windows Vista (got same results on Windows XP).

Here is a brief decription of code and the code itself.

Event listener is a class which loops over the Thread::Queue contents (listen method) and based on the contents inserts data in database and does other processing. Crawler class provides methods to crawl over the web sites (using WWW::Mechanize) and adds data to Queue. Crawler first login to the website and collect various links from where data is to be extracted and then inside a infinite loop crawl over those pages. Crawler returns immediately if there is no data available to extract, if data is found then it stays there and refreshes the page every 30 sec to get data till the data is available.

#!c://perl/bin/perl -w use strict; use warnings; use threads; use threads::shared; use Thread::Queue; use App::Options; use Crawler; use EventListner; use Data::Dumper; my $q = Thread::Queue->new(); my $options = \%App::options; # Get command line options my $page_status = {}; share $page_status; my $page_priorty = {}; share $page_priorty; # Open Listneracks my $event_listner = EventListner->new($q, $page_status, $page_priorty) +; my $listner_thr = threads->create(sub { $event_listner->listen(); }); $listner_thr->detach(); # Create crawler object my $crawler = Crawler->new($q, $options); $crawler->login(); # Login $crawler->fetch_pages(); # Fetch pages my $threads_count :shared = 0; while (1) { # Get all closed pages and sort on there priorty my @closed_pages = grep { $page_status->{$_} eq 'C' } keys %$page_ +status; my @priorty_sort_pages = sort {$page_priorty->{$a} <=> $page_prior +ty->{$b}} @closed_pages; foreach my $page (@priorty_sort_pages) { if ($threads_count < $options->{max_crawlers}) { $threads_count++; $page_status->{$page} = 'O'; $page_priorty->{$page}++; my $thr = threads->create(sub { threads->detach(); $crawler->run($page); $crawler->reset(); unless ($page_status->{$page} eq 'X') { $page_status->{$page} = 'C'; } $threads_count--; }); #if ($thr->is_running()) { sleep($options->{pause}) if $options->{pause}; #} #print "A\n"; } #print "B\n"; } }
So far I have only found only two things

I appreciate any help to solve and enhance the script. Thanking you all in advance.

Regards,
Ashish

In reply to Code using threads crashes intermittently by ashish.kvarma

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.