Threads and LWP::UserAgent question

gatito has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Threads and LWP::UserAgent question by ikegami (Patriarch) on Jun 09, 2008 at 05:14 UTC
I can't get an response via LWP::UserAgent on my requests. What does that mean? I'm under the impression that `request` returns a response for both successes and failures. And how did you ascertain this conclusion? I thought that might help if there is a scoping error You have tons of those. `use strict;` will reveal them. My basic single threaded script works fine A quick look at LWP's and HTML-Parser's documentation shows no indication that they are thread-safe. LWP appears to be completely written in Perl, so it's probably safe. HTML-Parser, on the other hand, is partially written in C. It's much more likely for a C module to not be thread-safe (since variables are shared by all threads). What happens if you remove all references to HTML::LinkExtor (temporarily)?	[reply] [d/l] [select]
Re^2: Threads and LWP::UserAgent question by gatito (Novice) on Jun 09, 2008 at 15:16 UTC
I can't get an response via LWP::UserAgent on my requests. I should have been a bit clearer. With a single-threaded app I have no trouble getting a response object and then getting it's content(the html of the page). In multi-threaded case I wasn't able to get the html at all, and after toying around with my code got a 400 error to appear.	[reply]
Re: Threads and LWP::UserAgent question by moritz (Cardinal) on Jun 09, 2008 at 04:53 UTC
Two things came to my mind while reading your question: Did you try LWP::Parallel? It might be just the right tool for your job When working with threads, try to keep the number of global variables minimal, that is declare all variables with my where possible. Using strict helps you to detect cases where you forgot some declarations. Global variables have to be cloned for each threads, which consumes quite some memory. Apart from that, using `use strict; use warnings;` also helps avoiding mistakes.	[reply] [d/l]
Re^2: Threads and LWP::UserAgent question by ikegami (Patriarch) on Jun 09, 2008 at 05:01 UTC
When working with threads, try to keep the number of global variables minimal Why? While that's good advice in general — it's easier to make sure that local variables are in the proper state than it is for global variables — I don't see why it's important for threads specifically. I can see this being an issue for shared variables (since their access needs be controlled by locks), but other package, lexical, global and local variables are per-thread by default.	[reply]
Re^3: Threads and LWP::UserAgent question by moritz (Cardinal) on Jun 09, 2008 at 05:26 UTC
Why? Because global variables have to copied for each created thread. Admittedly it's not such a big problem as with shared variables, but if you're careless they can cummulate to quite some size. One of the reasons that perl threads aren't exactly lightweight is that there are just too many global variables (think `$_ $/ $\ $, $; @ARGV @INC ...`) which all have to be copied for each created thread. And it's one of the reasons why Perl 6 tries to reduced the number of global variables wherever possible, and invents context variables instead. Specifically `@links` (in the root node) looks weird to me - it's not a shared variable, but it's still a global variable (by virtue of being mentioned in `sub XXX`), and later declared as lexical in the main program. `@queue` is also global, although used only in `XXX`.	[reply] [d/l] [select]
Re^3: Threads and LWP::UserAgent question by gatito (Novice) on Jun 09, 2008 at 05:12 UTC
If I ran my subroutines as external modules, wouldn't that make the global variable issue a non-issue? Doesn't each call from the top level live in it's own thread?	[reply]
Re^4: Threads and LWP::UserAgent question by moritz (Cardinal) on Jun 09, 2008 at 05:36 UTC
Re^4: Threads and LWP::UserAgent question by ikegami (Patriarch) on Jun 09, 2008 at 05:22 UTC
Re^2: Threads and LWP::UserAgent question by gatito (Novice) on Jun 09, 2008 at 15:41 UTC
1) I saw the module and briefly read about it but have not tried to use it. I suppose it might work, but I took my route as I figured it would be more robust ( with multithreading - ha! ) once I scale up to 10+ spiders each accessing a database. 2) Will do	[reply]
Re: Threads and LWP::UserAgent question by BrowserUk (Patriarch) on Jun 09, 2008 at 05:53 UTC
am I doing something mind-numbingly stupid Unfortunately yes. You made the mistake of using a CPAN module, Thread::Pool, that should have been strangled at birth. Try contacting the author/maintainer for support. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re: Threads and LWP::UserAgent question by misc (Friar) on Jun 09, 2008 at 13:12 UTC
Endangering myself getting strangled.. Some time ago I wrote my own threadpool implementation.. I was curious whether my implementation has the same problems.. I cannot tell why, but the version below works as expected, while some mysterious things (I also didn't look into the sources) seem to happen in Thread::Pool. I also added a small delay between the requests. Read more... script (18 kB) michael	[reply] [d/l]
Re^2: Threads and LWP::UserAgent question by BrowserUk (Patriarch) on Jun 10, 2008 at 03:26 UTC
Your module certainly contains less puerile cruft (optimise=>cpu/memory which does and has never done anything; the whole Thread::Conveyor 'abstraction to far' and much more), than Thread::Pool. Minus all that cruft, your module stands at least a fighting chance of being debuggable (Ie. maintainable). It does however exhibit a few um...peculiarities. Like `while ( 1 ){ { lock ${$self->{working threads}}; my $pending; { lock ${$self->{poolcount}}; $pending = ${$self->{poolcount}}; } if ( ${$self->{working threads}} + $pending > 0 ){ cond_wait ${$self->{workingthreads}}; } else { return; } } }` [download] I assume that the inner anonymous block is intended to restrict the range of effect of the lock statement, but given there is nothing outside of that anonymous block, within the encompassing while block, it is entirely superfluous. In `new()`, you have `${$self->{poolcount}} = 0;`. At other points in the code you have `lock ${$self->{poolcount}};` and `${$self->{poolcount}} --;` and `cond_signal ${$self->{poolcount}};` etc. The elements of hashes are scalars. Why are you autovivifying scalar references into those scalars? As best as I can tell, you're just complicating the code and making it run more slowly. These and similar constructs are rife throughout the module. You also appear to be doing far more locking than is necessary, but that's instinct rather than careful analysis talking. But the main problem with your modules is that (on the basis of inspection at least), it appears to have one major, fundamental flaw in common with Thread::Pool. It completely misses the point of "pooling threads". Ithreads are relatively expensive to both start and destroy--a side effect of emulation forking. Badly. The benefit of using a "pool of iThreads", is that you avoid those overheads, by starting long-running threads that loop over a shared source of 'work tokens', (usually by polling a mod::Thread::Queue) and 're-using' existing threads to perform repetitive tasks. As best I can tell without having run your code, in common with Thread::Pool, you are discarding old threads and starting new ones for each repetition of a basic task. Like disposable nappies, this may be convenient, but it is wasteful of resources. In addition, there are immense problems with the implementation of the OPs code--by the OP and by cut&paste, by your "test code". These mostly revolve around--as correctly identified by moritz--the use of global, and package global variables which appear to allow the main thread to share the links scavenged, but in reality result in a bunch of thread-uniquely accessible lists, which no other thread can ever access, and which get discard when you throw away the originating thread. In essence, as coded, all of the work done to extract the lists of links in the threads is entirely wasted, because they are deposited into a CLONED, non-shared, thread-specific piece of storage, that gets discard as soon as the work has been done to build it. Sorry to be negative, but as it stands, your 400 line module could be better replaced with 15 lines of direct threads code. Better, because the result would be a program that worked, rather than just ran. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^3: Threads and LWP::UserAgent question by misc (Friar) on Jun 10, 2008 at 10:49 UTC
Yes, there are probably too many locks. When I wrote this module some time ago I had to struggle with deadlocks. So I finished with locking all variables always in the same order, also in mind it would be easier to insert new functions into the code. When the whole thing worked, I didn't mind to clean up.. At the place you mentioned: Do you mean cond_wait would also unlock ${$self->{poolcount}} or that locking ${$self->{poolcount}} is simply unneccessary ? I didn't trie so much to have few locks, would you say that a lock is big penalty ? I used the hash $self everywhere because I had in mind to be able to have more than one threadpool in an script. You are however wrong about the creating of the threads, the module doesn't destroy any thread at all. Instead it creates up to maxthread threads and puts the work recieved with enqueue() onto a queue. The threads pull in T_thread() the work of the queue. I wrote this module, because I had some problems with growing memory of long running scripts, due to creating and destroying threads. The OP's code.. The only thing I've changed has been the small delay (select undef,undef,undef,0.2;). Since the OP didn't say exactly what he tries to do and where all the data should go, I didn't had a closer look, instead I really did copy'n paste. I've just been curious whether I could replace Thread::Pool with my module, since this worked without problems I thought my implementation could be helpful. There's also the point that the feedback, especially yours, helps me improve my code. So, no, I don't feel you would be negative.	[reply]
Re^4: Threads and LWP::UserAgent question by gatito (Novice) on Jun 12, 2008 at 16:22 UTC
Re^2: Threads and LWP::UserAgent question by gatito (Novice) on Jun 09, 2008 at 23:59 UTC
Huh, your code works fine on my PC desktop, and doesn't work on my pc laptop. You should submit that to CPAN as a replacement for the current garbage.	[reply]
Re^3: Threads and LWP::UserAgent question by misc (Friar) on Jun 10, 2008 at 10:50 UTC
What do you mean with it doesn't work on your laptop ? Are there any error messages ?	[reply]