in reply to Re: Invalid value for shared scalar
in thread Invalid value for shared scalar

Crap... that pretty much kills the design of this program, as I was passing that %siteHash variable between threads, which contained a URI object. Will Perl ever support something like that? Or is "true" multithreaded applications pretty much out of the question?

Replies are listed 'Best First'.
Re^3: Invalid value for shared scalar
by BrowserUk (Patriarch) on Jan 30, 2006 at 22:12 UTC
    Or is "true" multithreaded applications pretty much out of the question?

    It's perfectly possible to create multithreaded applications. It's not the threading that is biting you, it's the sharing of objects.

    For reasons of architecture which are too boring to repeat, you cannot (directly, and you don't want to know about the indirect mechanisms), share objects between threads. You can share data--that is scalars, hashes and arrays. The trick is to pass data between the threads and create objects within threads--for the exclusive use of that thread.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Invalid value for shared scalar
by ikegami (Patriarch) on Jan 30, 2006 at 21:36 UTC
    You could pass the URI as a string instead of as an object.
      is there any way to make an object sharable (like one I create) such as my Page class?
Re^3: Invalid value for shared scalar
by BrowserUk (Patriarch) on Jan 31, 2006 at 04:49 UTC

    Since it looks like you are trying to create a multi-threaded spider, here is a crude one to get you started. It is simple, it doesn't actually store what it fetches anywhere and could be much cleverer about making links absolute etc., but it demonstrates a few ideas.

    Usage:

    perl -s script.pl [-THREADS=2] url [url...]

    Use ^C to have it abort cleanly and produce some simple stats about what it did.

    #! perl -slw use strict; use threads; use threads::shared; use Thread::Queue; use LWP::UserAgent; use HTML::LinkExtor; use URI; $|=1; our $THREADS ||= 2; sub fetcher { my $tid = threads->self->tid; my( $Qwork, $hashref ) = @_; my $ua = new LWP::UserAgent; my $p = new HTML::LinkExtor; while( my $next = $Qwork->dequeue ) { next if exists $hashref->{ $next }; my $res = $ua->get( $next ); $hashref->{ $next } = $res->code; print "$tid: Failed to fetch $next", next unless $res->is_succ +ess; $p->parse( $res->content ); print "Fetched $next"; my @links = map{ my( $tag, %attrs ) = @$_; $tag eq 'a' ? $attrs{ href } : () } $p->links; printf "$tid: Stacked %d links\n", scalar @links; $Qwork->enqueue( @links ); } } ## A place to store the urls visited. ## Checked by all threads my %urls : shared; ## A queue of links to fetch and follow ## Added to and read by all threads my $Qwork = new Thread::Queue; ## Prime the pump $Qwork->enqueue( @ARGV ); ## Start the fetchers my @fetchers = map{ threads->create( \&fetcher, $Qwork, \%urls ) } 1 .. $THREADS; ## A flag to abort the main thread and clean up my $stop = 0; ## ^C aborts $SIG{INT} = sub{ ## Empty the work queue $Qwork->dequeue while $Qwork->pending; ## And stack an undef for each fetcher to abort their main loops $Qwork->enqueue( (undef) x @fetchers ); $stop = 1; }; ## Log some progress information printf "Fetched: %d (in Q: %d)\n", scalar keys %urls, $Qwork->pending + while not $stop and sleep 5; ## Wait for the fetchers to stop $_->join for @fetchers; ## Gen some stats my( %stats, $total ); ++$total, ++$stats{ $_ } for values %urls; ## And print them print "Atempted: $total"; print "Status: $_ ( $stats{ $_ } )" for sort{$a<=>$b} keys %stats;

    Be careful where you let it loose.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks so much. This is working great, with one problem. The ^C doesn't seem to be working. It kills the programs and errors with "Signal SIGINT received, but no signal handler set." Should I be importing some other module, or registering it a different way? Thanks
      $SIG{INT} = sub { print "got a signal\n"; ## Empty the work queue $Qdownload->dequeue while $Qdownload->pending; ## And stack an undef for each fetcher to abort their main loops $Qdownload->enqueue( (undef) x @downloaders ); $stop = 1; };

        Hm. All I can tell you is that it works for me on my system.

        c:\Perl\test>t-spider.pl http://perlmonks.com/index.pl Fetched: 0 (in Q: 0) Fetched http://perlmonks.com/index.pl 1: Stacked 195 links Fetched: 59 (in Q: 121) Fetched http://oreilly.com/promos/perlbooks/?CMP=OTC-FN3412256845 1: Stacked 19 links Fetched http://perl-oak.sf.net 2: Stacked 5 links Fetched: 64 (in Q: 139) Fetched http://cvs.sourceforge.net/viewcvs.py/perl-oak/Oak/liboak2-per +l/lib/Oak2/ 1: Stacked 43 links Fetched: 137 (in Q: 74) Fetched http://www.cafepress.com/perlmonks,perlmonks_too,pm_more 1: Stacked 76 links Fetched: 141 (in Q: 2) Fetched http://everydevel.com 1: Stacked 104 links Fetched http://search.cpan.org/perldoc?DBI#fetchall_arrayref 2: Stacked 459 links Attempted: 143 Status: 200 ( 7 ) Status: 400 ( 136 )

        If you are on Win32, post the output from Perl -V and I'll see if I can isolate the difference.

        If you are on some flavour of *nix, then maybe ^C doesn't generate a SIGINT? You'll have to talk to someone with knowledge of your platform to resolve this issue.

        I should add that using an interupt handler this way is not really an "architected" solution, just a simple mechanism that allowed me to produce a reasonably clean demonstration without getting into the nitty gritty of a full fledged application.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.