in reply to Re: Threads and LWP::UserAgent question
in thread Threads and LWP::UserAgent question

Your module certainly contains less puerile cruft (optimise=>cpu/memory which does and has never done anything; the whole Thread::Conveyor 'abstraction to far' and much more), than Thread::Pool.

Minus all that cruft, your module stands at least a fighting chance of being debuggable (Ie. maintainable).

It does however exhibit a few um...peculiarities. Like

while ( 1 ){ { lock ${$self->{working threads}}; my $pending; { lock ${$self->{poolcount}}; $pending = ${$self->{poolcount}}; } if ( ${$self->{working threads}} + $pending > 0 ){ cond_wait ${$self->{workingthreads}}; } else { return; } } }

I assume that the inner anonymous block is intended to restrict the range of effect of the lock statement, but given there is nothing outside of that anonymous block, within the encompassing while block, it is entirely superfluous.

In new(), you have ${$self->{poolcount}} = 0;. At other points in the code you have lock ${$self->{poolcount}}; and ${$self->{poolcount}} --; and cond_signal ${$self->{poolcount}}; etc.

The elements of hashes are scalars. Why are you autovivifying scalar references into those scalars? As best as I can tell, you're just complicating the code and making it run more slowly.

These and similar constructs are rife throughout the module. You also appear to be doing far more locking than is necessary, but that's instinct rather than careful analysis talking.

But the main problem with your modules is that (on the basis of inspection at least), it appears to have one major, fundamental flaw in common with Thread::Pool. It completely misses the point of "pooling threads". Ithreads are relatively expensive to both start and destroy--a side effect of emulation forking. Badly.

The benefit of using a "pool of iThreads", is that you avoid those overheads, by starting long-running threads that loop over a shared source of 'work tokens', (usually by polling a mod::Thread::Queue) and 're-using' existing threads to perform repetitive tasks.

As best I can tell without having run your code, in common with Thread::Pool, you are discarding old threads and starting new ones for each repetition of a basic task. Like disposable nappies, this may be convenient, but it is wasteful of resources.

In addition, there are immense problems with the implementation of the OPs code--by the OP and by cut&paste, by your "test code". These mostly revolve around--as correctly identified by moritz--the use of global, and package global variables which appear to allow the main thread to share the links scavenged, but in reality result in a bunch of thread-uniquely accessible lists, which no other thread can ever access, and which get discard when you throw away the originating thread.

In essence, as coded, all of the work done to extract the lists of links in the threads is entirely wasted, because they are deposited into a CLONED, non-shared, thread-specific piece of storage, that gets discard as soon as the work has been done to build it.

Sorry to be negative, but as it stands, your 400 line module could be better replaced with 15 lines of direct threads code. Better, because the result would be a program that worked, rather than just ran.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^3: Threads and LWP::UserAgent question
by misc (Friar) on Jun 10, 2008 at 10:49 UTC
    Yes, there are probably too many locks.

    When I wrote this module some time ago I had to struggle with deadlocks.
    So I finished with locking all variables always in the same order,
    also in mind it would be easier to insert new functions into the code.

    When the whole thing worked, I didn't mind to clean up..

    At the place you mentioned: Do you mean cond_wait would also unlock ${$self->{poolcount}} or that locking ${$self->{poolcount}} is simply unneccessary ?
    I didn't trie so much to have few locks, would you say that a lock is big penalty ?

    I used the hash $self everywhere because I had in mind to be able to have more than one threadpool in an script.

    You are however wrong about the creating of the threads,
    the module doesn't destroy any thread at all.
    Instead it creates up to maxthread threads and puts the work recieved with enqueue() onto a queue.
    The threads pull in T_thread() the work of the queue.

    I wrote this module, because I had some problems with growing memory of long running scripts, due to creating and destroying threads.

    The OP's code..
    The only thing I've changed has been the small delay (select undef,undef,undef,0.2;).
    Since the OP didn't say exactly what he tries to do and where all the data should go, I didn't had a closer look, instead I really did copy'n paste.

    I've just been curious whether I could replace Thread::Pool with my module,
    since this worked without problems I thought my implementation could be helpful.
    There's also the point that the feedback, especially yours, helps me improve my code.

    So, no, I don't feel you would be negative.
      my original code was a simple version of a spider I'm working on. It was badly written and turned out to be the wrong thing actually. So, I've been cobbling a new version together.