in reply to Re: LWP, DBI and Free to Wrong Pool error
in thread LWP, DBI and Free to Wrong Pool error

I was sharing an LWP::UserAgent object between threads, based on advice from some monks that it was OK to do so. I changed that, and just tested again, and same problem. Here's the code for the main loop:
my $pm = new Parallel::ForkManager(20); for my $i (0..$#itemstoget) { $pm->start and next; # do the fork my ($id,$url,$title,$excerpt) = @{ $itemstoget[$i] }; my $user_agent = LWP::UserAgent->new; $user_agent->timeout(30); my $request = HTTP::Request->new('GET', $url); my $response = $user_agent->request($request); my $dbh = connectdb('blogdb'); if ($response->is_success) { unless (isforeignlanguage($response,$title,$excerpt,$u +rl)) { my $html = resolve_charset($response->content); # if the html meets the criteria for at least one +client that claims it, extract the text my $itemok = checkhtml($dbh,$id,$html); if ($itemok) { inserthtml($dbh,$id,$html); print "OK $id ",substr($url,0,50),"\n"; } else { print "SKIP $url\n"; dosql($dbh,"update blogitems set getattempts=9 +99 where id=$id"); } } else { print "FOREIGN $url\n"; dosql($dbh,"update blogitems set getattempts=999 w +here id=$id"); } } else { print "FAILED $url\n"; dosql($dbh,"update blogitems set getattempts=getattempts + + 1 where id=$id"); } $dbh->disconnect; undef $user_agent; $pm->finish; } $pm->wait_all_children;
As for the subs, resolve_charset figures out and decodes the charset, isforeignlanguage applies Lingua::Identify to see if it's English ir not, itemok makes sure the post contains requisite keywords, and inserthtml puts it into the database.

As for threads vs. pseudofork, it's bc I'm still learning this multi-threaded stuff and pseudofork seems more straightforward. Would using threads instead solve this problem?

Replies are listed 'Best First'.
Re^3: LWP, DBI and Free to Wrong Pool error
by BrowserUk (Patriarch) on Apr 28, 2007 at 21:48 UTC
    Would using threads instead solve this problem?

    Maybe. You could try something like this, which compiles clean but is obviously untested.

    #! perl -slw use strict; use threads; use threads::shared; use Thread::Queue; use constant NTHREADS => 20; my @itemstoget = (); ## Get the items from somewhere? my $Qwork = new Thread::Queue; $Qwork->enqueue( join chr(0), @{ $_ } ) for @itemstoget; $Qwork->enqueue( (undef) x NTHREADS ); my $Qresults = new Thread::Queue; my $running : shared = 0; threads->new( \&thread, $Qwork, $Qresults )->detach for 1 .. NTHREADS; my $dbh = connectdb( 'blogdb' ); sleep 1 until $Qresults->pending; while( $running or $Qresults->pending ) { ## Modified condition sleep( 1 ), next unless $Qresults->pending; my( $id, $url, $html ) = split chr(0), $Qresults->dequeue; if( $html ne 'FAILED' ) { # if the html meets the criteria for at least one client that +claims it, # extract the text if( checkhtml( $dbh, $id, $html ) ) { inserthtml( $dbh, $id, $html ); print "OK $id ",substr( $url, 0, 50 ),"\n"; } else { print "SKIP $url\n"; dosql( $dbh,"update blogitems set getattempts=999 where id +=$id" ); } } else { print "FOREIGN or FAILED $url\n"; dosql($dbh,"update blogitems set getattempts=999 where id=$id" +); } } $dbh->disconnect; exit; sub thread { { lock $running; ++$running } my( $Qwork, $Qresults ) = @_; my $user_agent = LWP::UserAgent->new; $user_agent->timeout( 30 ); while( my $item = $Qwork->dequeue ) { my( $id, $url, $title, $excerpt ) = split chr(0), $item; my $request = HTTP::Request->new( 'GET', $url ); my $response = $user_agent->request( $request ); my $html = ( $response->is_success and not isforeignlanguage( $response, $title, $excerpt, $u +rl ) ) ? resolve_charset( $response->content ) : 'FAILED'; $Qresults->enqueue( join chr(0), $id, $url, $html ); } undef $user_agent; { lock $running; --$running } }

    It should avoid the reentrancy problem with DBI by only accessing the DB from the main thread. It queues up the work items in a shared queue and starts 20 threads to fetch the urls. It then performs the sanity checks on the response before posting either html fetched or a failure code back to the main thread for processing to the database.

    The threads will stop once the queue empties and the main thread will stop once there are no further results to process.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Many thanks kind monk, that certaintainly did solve the problem. Not only that it runs twice as fast. Well OK maybe that's an exaggeration, but it's a lot faster.

      The only problem seems to be that it doesn't exit when the queue is empty. Does $running perhaps need to be modified inside the loop? I'm not sure because I don't really understand how the statement $running : shared = 0; works. Is the colon a conditional operator or does it tie $running to something in threads::shared?

      Steve

        Check the loop condition, I just moved the sleep. That may not be the only logic error. It was a mind-only exercise and in between watching a movie to boot.

        Is the colon ... or does it tie $running to something in threads::shared?

        Yes. It makes $running shared by all threads. If you look at the top and bottom of thread(), you'll see that when a thread starts, it increments that variable. And decrements it when it finishes. The upshot is that when all the threads have finished, because they have exhausted the in-bound $Qwork od item to fetch, $running drops to zero and the main thread will see that and terminate.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      I was getting Free to wrong pool last year when I attempted to use the
      Windows Fork with Tk being used in the child process. The details of
      my problem set were different than yours but perhaps the overall
      problem set and solution approach could be similar.

      After some digging, the problem turned out that Tk was not thread
      safe. My answer was to use the fork earlier on in the program, and not
      "use Tk" in any module prior to the module that used the fork.

      The parent does some things and waits a prescribed time for the child
      to finish just as before, only instead of the child pop up any error
      message, which is what used Tk, the child now writes any reports to a
      temp file (that is known to the parent). After the child exits - or
      is killed by the parent - the parent then returns the file contents
      to the caller, and a subsequent module handles using Tk to pop up any
      error messages.

      One more thing. If you decide to use the Windows fork, do not use the
      sleep command in the parent or child. It hangs. Instead, use

      select(undef, undef, undef,whatever);
        ne more thing. If you decide to use the Windows fork, do not use the sleep command in the parent or child. It hangs.

        You sure? I've never encountered that:

        C:\test>perl -le"if(fork){sleep 5;print'parent'}else{sleep 7;print'chi +ld'}" parent child

        Got code to demonstrate it?

        Also, Win32::Sleep( $milliseconds ); is better that 4-arg select, and is built-in (doesn't need a module).


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.